-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid parallel creation of virtual environment and pip upgrade (fix i… #166
Conversation
…ssue OCR-D#165) Signed-off-by: Stefan Weil <[email protected]>
PR #164 and PR #166 seem to fix the parallel build issues. I still see an exception, but it looks like all processors are built nevertheless in less than 5 minutes:
Still nagging are the repeated tesseract builds (see PR #121), but they now take only 35 seconds with the parallel build. |
$(SEM) $(PYTHON) -m venv $(VIRTUAL_ENV) | ||
. $(ACTIVATE_VENV) && $(SEM) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to use sem
for this as well. But we should not mix the semaphore for git submodule action with that for venv modifications. Also, sub-venvs should be different again. And we should not only protect the initial creation, but any pip install
usage as much as we can.
$(SEM) $(PYTHON) -m venv $(VIRTUAL_ENV) | |
. $(ACTIVATE_VENV) && $(SEM) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools | |
$(SEMPIP) $(PYTHON) -m venv $(VIRTUAL_ENV) | |
. $(ACTIVATE_VENV) && $(SEMPIP) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools |
with SEMPIP
defined in similar fashion to the SEM
for git, only with --id ocrd_all_$(VIRTUAL_ENV)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but think that using new semaphores is overkill because the protected actions are rather short. Typically the module updates will be finished when sub-venvs are created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should not only protect the initial creation, but any pip install usage as much as we can.
I don't understand that part. Are there problems with parallel instances of pip install
? I would have expected that pip
can handle that. Parallel installation is desired, because it can save much time. Upgrading pip
is a very special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but think that using new semaphores is overkill because the protected actions are rather short. Typically the module updates will be finished when sub-venvs are created.
Don't be so sure. Module checkouts can take a while for large/recursive repos. That's time wasted for others with large Python dependencies (Tensorflow etc). Especially since sub-venvs are completely independent of each other.
The cost of distinguishing SEMGIT
and SEMPIP
is minimal, and it maximizes parallel time available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should not only protect the initial creation, but any pip install usage as much as we can.
I don't understand that part. Are there problems with parallel instances of
pip install
? I would have expected thatpip
can handle that. Parallel installation is desired, because it can save much time. Upgradingpip
is a very special case.
Yes, exactly. I don't think pip does file locking. You're right that pip updating itself is more sensitive, but I guess there are potential problems elsewhere, too (esp. after #142 makes pip intolerant of temporary inconsistencies). Have not tried/analysed it though. (And no, never saw make all -j
fail like that either. But perhaps I have just not tried hard enough.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically the module updates will be finished when sub-venvs are created.
Don't be so sure. Module checkouts can take a while for large/recursive repos. That's time wasted for others with large Python dependencies (Tensorflow etc).
See the new issue #169 for my answer. Yes, we can optimize. But currently introducing new semaphores does not result in faster builds. I just tested that. That might change when #169 is solved. I guess that new semaphores will reduce the build time by less than 5 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. So let's keep this in mind and check again. Yes, probably very unlikely that multiple modules want to install the same dependency at the same time. But even if so, as long as they picked the same version, it's probably going to work...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don't see the harm in using different semaphores for different purposes but if the gains are inconsequential, it doesn't matter.
@bertsky, can we merge this (and think about optimization later)? |
agreed. |
…ssue #165)
Signed-off-by: Stefan Weil [email protected]