-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notify feature for non-interactive scripts is unreliable on large number of jobs [1000+] #18
Comments
Hi @Asumerodi and thanks for your report, I find it a bit hard to understand exactly how this code is executing, could you post me a more complete sample? Also, what version of zsh are you running? There are problems on at least 5.0.2 and 5.0.8 that I can't do anything about. Furthermore, the use of kill-signals (before zsh 5.2) is not optimal for a lot of simultaneous job execution. One thing to keep in mind is that eventually the shell will not be able to create more forks, so there's a limit to how many async_jobs can run simultaneously, especially if your job creates new forks. You could try changing this line to redirect errors inside the worker to a file, e.g. Sorry I've got nothing concrete for you yet, need to understand the problem you're having better first 😄. |
Re: complete sample. You don't really need to share your business logic, code that runs with a "noop" function would help. |
Actually, I might in fact understand the problem here. Since you're using I recommend you change a few bits: test_large_number_of_jobs() {
readonly -i cores=4
readonly -i files=1000
integer COMPLETED_JOBS=0
integer i j goo
completed_callback() {
print cb $@
COMPLETED_JOBS=$((COMPLETED_JOBS + 1))
}
convert() {
print $1
}
async_start_worker async_conversion
for (( i=1, j=cores; i <= (files + cores); i+=cores, j+=cores )); do
print $i $j
if [[ $j -gt $files ]]; then
j=$files
fi
for (( goo=i; goo <= j; goo++ )); do
print $i $j $goo
if [[ $goo -le $files ]]; then
async_job async_conversion convert $goo
fi
done
while (( COMPLETED_JOBS < j )); do
sleep 0.001
async_process_results async_conversion completed_callback
echo $COMPLETED_JOBS -- $j
done
done
async_stop_worker async_conversion
} I changed some things to keep it simple when writing the test function, but the biggest change is that we don't use |
I know the code is ugly and was even reluctant to post in that form, and I appreciate the work around. I will certainly be using it for now. Zsh version is 5.3.1 on Arch Linux so unless there is a regression in this version I dunno if what you mention applies. Essentially the loop is running however many cores are detected on the system times 2, jobs. On my system that would be 16 jobs running concurrently, once all 16 finish, 16 more are started until all conversions are finished. the only other meaningful code in the script besides option parsing is this function which handles conversion. convert () {
local -r file=$1
if [[ -n ${outputdir} ]]; then
local -r outdir=${outputdir[3]} # <--- user defined directory to put converted files
else
local -r outdir=$MUSICDIR/ALAC # <--- if not specified on command line use default
fi
local -r outfile=${outdir}/${file%flac}m4a
# error check unimportant to execution
if [[ -n ${musicdir} ]] && [[ $(dirname ${file%/*}) != "." ]]; then
mkdir -p ${outdir}/${file%/*}
else
mkdir -p ${outdir}
fi
# the print calls are just for the async_process_results
if ! [[ -f ${outfile} ]]; then
if ! ffmpeg -i ${file} -vn -c:a alac ${outfile}; then
print -u2 "'${outfile##*/}'"
return 1
fi
print "'${outfile##*/}'"
else
print -u2 "'${outfile##*/}': exists"
return 3
fi
return 0
} sorry if it still hard to follow this was just a brain dump that I want to fix up after I get it working correctly. |
Don't be so hard on yourself, it is fine 😄.
No, you're fine in that regard. I honestly didn't consider at first that this was running as a script (without ZLE).
Thanks for posting it. Now, I don't see anything obvious, however, would you mind changing I really wish I could make the notification ( There is one way to utilize the ZLE watcher in scripts as well, but you'd have to start an interactive shell to accomplish that, there's an example of how to do it in async_test.zsh#L482-L512. PS. Out of curiosity, did you try the |
I did try appending to '/tmp/err.log' on the line you mentioned however after the script gets stuck in the loop there is no error in that file its just empty. I also tried changing that 'dirname' to a parameter expansion as you suggested and am still getting the same behavior. Fortunately though your work around to call |
Thanks for testing those cases @Asumerodi, it means I can cross some of my suspicions off the list. I'll try to look into this more deeply when I have some extra time. In the meantime, I'm glad that the work-around helped 🙂. Good luck with your script! |
@Asumerodi it may be 2 years too late, but I might have a fix for your original issue in #45. If you're still using async for these kinds of things, feel free to give it a spin. If not, just thought I'd let you know. |
Update 1: It turns out Behavior@mafredri I may have encounter similar problem in my async zsh prompt. I have made 22 sections in the prompt async before. But I found 3 of the sections are never rendered in my prompt. What I've diggedI tried to echo in function
The keywords To check if the worker is killed in my prompt code. I tried intercept the function
And it's only killed and printed once in Seems a bug in Reproduce the bugLoad my prompt spacezsh and enable all async section by modify default setting in
Temporary SolutionCause not all the section is time-consuming, I'll make the section async as needed depending on the execution time of the async job I get in the callback function. |
I have created a batch conversion script to convert thousands of songs in zsh using this package. I am noticing however that that after several hundred jobs the notify feature fails and I get stuck in an infinite loop waiting for a job to finish, that is actually already done.
I know the job is finished because I am using the 'watch' utility to watch the subprocess to completion. I am using a setup more complex but similar to your example. I am using a loop mechanism to only process '$(nproc) * 2' jobs at a time as to not overload the cpu, so when this bug occurs the script can't continue to the next batch and I have to kill it to continue.
I have tried killing the worker after each iteration and starting a new one at the beginning of the loop but the same behavior continues. I have also tried modifying the script to run without limiting the number of jobs and the same behavior still occurs after 200-300 jobs.
This is a big pain because once it crosses the threshold, the script constantly fails and I can only process 2 or 3 interations on the next try before it fails again (my script checks for files that are already converted so this still constitutes as a job even though no conversion takes place).
Here is the relevant code section
The text was updated successfully, but these errors were encountered: