-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APScheduler 4.0.0 Bug #803
Comments
Schedules are supposed to be deleted when they've run their course. Is this what you're seeing? Without a reproducing example I can't say much more. |
Yes sure, in my case they should add a new schedule with the same name at the end of the task. I referred to the non existence of the new schedules |
I was able to reproduce this bug also with 4.0.0a3. Posting my example now. Just a quick comment: if you use |
Could you try with v4.0.0a4? |
I upgraded apscheduler, deleted the old tables in the database and tried to use the same code as before, but getting now this error:
sorry my phone messes with the formating in the terminal, i can get you a properly formatted one later |
That looks like there are strings leaking to where only datetimes should go. Can you provide a simpler way to reproduce the problem that you're seeing? |
The error is from the Interval trigger, just create a schedule with an interval trigger. It executes ones and then creates this error |
It works with the memory store, but with the sqlalchemy store I'm getting a bizarre (but different) error about the function reference which I'm investigating. |
That was a false alarm, it was caused by a leftover schedule from another run. When I cleared the database, it ran perfectly. Is the PostgreSQL standalone example not working for you? |
I created a whole new database and run the standalone example, which works fine for me! |
Ok, I missed your use of JSONSerializer there. I've actually just started a discussion on this, and would like to know your reasons for a serializer other than |
Replied there as well. For me it was more debugging, to understand what arguments are going in. |
And I think I found the problem causing this bug of the disappearing schedules. async def regular_check(check_id: str, scheduler: Optional[ExtendedAsyncScheduler] = None):
if not scheduler:
try:
scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
except LookupError as e:
raise e
check_id = check_id.replace("regular_check-", "")
[check_no, iteration] = check_id.split("+")
iteration = int(iteration)
try:
old_schedule = await scheduler.get_schedule(f"regular_check-{check_id}")
except ScheduleLookupError:
old_schedule = None
try:
run_date = datetime.now(tz=timezone.utc) + timedelta(minutes=5)
await scheduler.add_schedule(regular_check,
DateTrigger(run_time=run_date),
id=f"regular_check-{check_no}+{iteration + 1}",
misfire_grace_time=300,
conflict_policy=apscheduler.ConflictPolicy.exception,
args=(f"{check_no}+{iteration + 1}",),
max_jitter=30)
try:
new_schedule = await scheduler.get_schedule(f"regular_check-{check_no}+{iteration + 1}")
except ScheduleLookupError:
new_schedule = None
scheduler.logger.info(f"Old schedule: {old_schedule=}\nNew schedule: {new_schedule=}" +
f"\nAdded regular check {check_id} {iteration+1} at {run_date}")
except Exception as e:
scheduler.logger.error(f"Error in regular check {check_id}\n{traceback.format_exc()}")
async def create_schedules(scheduler: Optional[ExtendedAsyncScheduler] = None):
if not scheduler:
try:
scheduler: ExtendedAsyncScheduler = apscheduler.current_async_scheduler.get()
except LookupError as e:
raise e
db: asyncpg.Pool = scheduler.db
try:
async with db.acquire() as conn:
for i in range(2000):
x = str(i)
while len(x) < 4:
x = "0" + x
try:
schedule = await conn.fetchrow("SELECT * from jsonserializer.schedules where id ilike $1",
f"regular_check-{x}%")
except Exception as e:
traceback.print_exc()
raise e
if not schedule:
await scheduler.add_job(regular_check, args=(f'{x}+0',))
scheduler.logger.error("Finished creating schedules")
except Exception as e:
scheduler.logger.error(f"Error in create_schedules\n{traceback.format_exc()}") |
Schedules are always deleted when their triggers don't produce any more fire times, that is normal. Is that what you mean? |
Schedule 1 with a datetrigger is executed, during the execution a new schedule 2 with the same id is created. By using real unique ids this can be avoided. Sadly the part with the KeyError still happens |
Is your expectation that the schedule is available as long as there are any jobs active associated with it? |
As long as it is consistent I am okay with either one. I would prefer it not being available unless it produces another runtime. Because the schedule exhausted and therefore the schedule id should be free to use again. |
That sounds reasonable. The reason why it works the way it does now is that |
This out of the way, the initial reported error still occurs 🙈 |
Can you clarify what you mean?
The schedulers fetch jobs in the FIFO order, by |
the initial error was that a KeyError removing job.id from self._running_jobs |
I "fixed" that by doing |
Can you reproduce this with the latest |
Yes, will have it running over the next few hours |
I get no error messages, but the number of jobs is still decaying. After 15 minutes the number of regular_checks droped from 2000 to 911. |
So let me just make sure we're on the same page here. Why are you replacing the schedule from a job belonging to that schedule? |
I have an api need to call for a list of identifier over and over again. The time of the next call is dependent on the api response for the identifier. The function making the api calls is With APScheduler v3 I used and sqlite jobstore and never had any issues with this way of doing it. I can see that using the same id again causes issues, but I would argue that the caused behavior is not intuitive (especially considering the used conflictpolicy) |
If you're using |
Isn't that what I am doing in regular_checks? |
Ok, I think I didn't read everything thoroughly, so the later snippet is your solution/workaround for the original problem, correct? That would indeed work. The scheduler deletes a schedule when its trigger can no longer generate any new fire times, and this may happen while the job that was spawned from said schedule is still running. I'm wondering if perhaps it shouldn't be? Perhaps instead finished schedules should be cleaned up by the data store cleanup procedure instead which ensures that only schedules with no running jobs are deleted. |
Not really solving, because the schedules still get somehow lost. It seems to be related with the number of schedules/jobs. If you keep them below 800 or so they stay stable for hours. So I assume the issue is caused by some kind of bad timing between cleanups, new schedules being stored and the duration of a job. While I agree it is somewhat silly and fixable at my side, this behavior is not predictable and shouldn't happen. Beside the solution you suggested, I can think of another one. The check if the Trigger is not producing another schedule should be rechecked/the trigger should be compared before removal of the schedule, to prevent exactly this |
And yes the later snippet with the added incremental counter fixes it because then the change in the trigger is also a change in the schedule |
I forgot that this issue should already be fixed in v4.0.0a5. Please let me know either way when you test it. |
Still occuring |
Hello, get same crashes at random time. Its the scheduler part:
There is the only addition interaction with shcheduler in code: And sometimes scheduler just crashes with long traceback, but the final part is the same:
|
It would be helpful to have a minimum workable example. All the snippets in this issue are either unnecessarily complex or incomplete. One thing to consider is the run time of the jobs. If the jobs run for longer than 15 minutes, they could trigger a known bug which is present even in v4.0.0a5. @doluk When you say it's still happening, do you mean the original issue (with all those exceptions) or something else, namely the number of schedules being wrong? @Soulbadguy54 Could you produce a minimum workable example I could run to reproduce the issue using just the memory data store? |
It is happening that schedules go missing, sorry worded it badly |
So there are two different issues here: schedules going missing, and the KeyError about job IDs. I'm not sure yet that they're related. |
Its can be related, since the scheduler tries to delete missing job or something like this ... |
But if the scheduler tries to delete jobs that aren't there, there would be errors like the ones above. I'm not seeing any, however, when I try to reproduce the issue with your script. |
So my example runs eith a postgresql database (version 16) without any decaying number of schedules? How many schedules do you let it create? For smaller numbers below 2000, it takes way longer to occur/never does. If I let it create 6000 I can see them drop within 15 minutes |
As I recall, I saw a decaying number of schedules the last time I ran it, but I didn't see any |
yes no keyerrors anymore |
apscheduler==4.0.0a5 . Same error here:
My scheduler is simple:
|
I get the same KeyError error. If it's important to know, I'm using: apscheduler 4.0.0a5, Mongodb datastore, python3.9. I see that there are already several other code examples above where the same error is occurring. Because of this, I will not post example code in this post. Error Log:
|
I believe this has been fixed in master now. Let me know if it happens again. |
I am using the last released version (APScheduler==4.0.0a5) and the error still occurs. This should be reopened. |
Why, if it's fixed in master? Have you tried the master version? |
This error had stopped for a while. However, it recently started happening again. I've tried to find out why this is happening and I haven't found it. |
I've never encountered this issue in my own experiments or tests, which is why I'd need a reproducible example. |
Could you share one of your own tracebacks where this happens? |
Absolutely, I'm currently using a modified fork where I ignore this error. But I'll try running the current version of your branch and see if the error occurs again. |
Ok, I finally have a reproducer for this, provided to me in #952. I'm investigating. Please subscribe to that issue for further updates. |
Things to check first
I have checked that my issue does not already have a solution in the FAQ
I have searched the existing issues and didn't find my bug already reported there
I have checked that my bug is still present in the latest release
Version
4.0.0a2.post72
What happened?
I know it is an alpha etc. But I wanted to report it, because I didn't observe it with lower number of jobs:
I use the apscheduler with a postgresql jobstore and have around 5000 schedules, but the same task, just different arguments. They are scheduled with a DateTrigger and schedule themself at the end of the task for the next execution.
After some runtime some of the schedules are simply gone, the task itself had no errors and ensures in theory always a rescheduling. Meaning the number of schedules drops constantly. I observed this behavior with the thread executor but also the normal async executor. With the thread executor often the complete scheduler would crash occasionally, With the async one at least the scheduler stays alive. But both times the jobs table has an enourmous number of rows. Right now I have around 100k rows in there (I can provide a dump if needed). Also the logs shows the following error multiple times (same error just another scope valueand UUID):
How can we reproduce the bug?
The text was updated successfully, but these errors were encountered: