From 921baa9efdba96ae2e07078a2f30dbd148907d83 Mon Sep 17 00:00:00 2001 From: Danilo Jezernik Date: Mon, 9 Sep 2024 08:38:54 +0200 Subject: [PATCH] add readme documentation about background scheduler --- documentation/background-scheduler/README.md | 145 +++++++++++++++++++ 1 file changed, 145 insertions(+) create mode 100644 documentation/background-scheduler/README.md diff --git a/documentation/background-scheduler/README.md b/documentation/background-scheduler/README.md new file mode 100644 index 0000000..ab3f4d2 --- /dev/null +++ b/documentation/background-scheduler/README.md @@ -0,0 +1,145 @@ +# Background Scheduler Documentation + +## What is a Background Scheduler? + +A Background Scheduler is a tool that allows you to run tasks or functions at specific intervals, asynchronously, +without blocking the main application process. It is often used to perform background tasks that need to run +periodically, such as data synchronization, sending notifications, running maintenance tasks, or cleaning up resources. + +**APScheduler** (Advanced Python Scheduler) is a popular Python library that provides powerful functionality for +scheduling jobs in the background. It supports various job stores, execution options, and flexibility in scheduling. + +## How Background Scheduler is Typically Used + +1. **Running Periodic Tasks**: Background schedulers are commonly used for tasks that need to be executed at regular + intervals, like cleaning up expired data from a database or updating a cache. +2. **Scheduled Notifications**: Sending scheduled emails, reminders, or push notifications at set intervals or times. +3. **Data Syncing**: Automatically fetching or syncing data from external services (e.g., APIs) at regular intervals to + ensure fresh and up-to-date data. +4. **Database Maintenance**: Running database cleanup jobs, backup tasks, or archiving operations periodically. + +## APScheduler Components + +APScheduler works by scheduling jobs and running them either once or on a repeating schedule. Here are some of the key +components: + +- **Job**: A task or function that you want to schedule. This can be a function or method. +- **Trigger**: Specifies when a job should be executed. Commonly used triggers include `interval` (to run periodically), + `date` (to run once at a specific time), and `cron` (to run based on a cron expression). +- **Job Store**: Where the job metadata is stored (in-memory, file, or a database). +- **Executor**: Specifies how the job should be executed (e.g., in a thread or in a process). + +### Basic Example of Using APScheduler + +```python +from apscheduler.schedulers.background import BackgroundScheduler +import time + + +# A simple function to be scheduled +def job(): + print("Running scheduled task") + + +# Create a scheduler instance +scheduler = BackgroundScheduler() + +# Add a job to run every 10 seconds +scheduler.add_job(job, 'interval', seconds=10) + +# Start the scheduler in the background +scheduler.start() + +# Keep the script running to allow the scheduler to run in the background +try: + while True: + time.sleep(2) +except (KeyboardInterrupt, SystemExit): + scheduler.shutdown() # Shut down the scheduler when exiting the app +``` + +In this example: + +- The `job` function will be executed every 10 seconds without blocking the main program. +- The `BackgroundScheduler` allows the scheduler to run in the background while the main application continues to + operate. + +### Common Use Cases for Background Schedulers + +- **Web Applications**: Running tasks in the background to offload work from the main application, such as updating + data, sending notifications, or managing queues. +- **Data Pipelines**: Running periodic data aggregation or transformation tasks in the background without user + intervention. +- **Monitoring Systems**: Scheduling tasks to monitor services or systems and send alerts or take action when certain + conditions are met. +- **Maintenance Jobs**: Scheduling cleanup tasks, backups, or other administrative tasks that need to run regularly + without user interaction. + +## How Background Scheduler is Used in This Route + +In the provided route, the **APScheduler** is used to automate the updating of programming language tags from the Stack +Overflow API at regular intervals (every 24 hours). This approach ensures that the data in the database remains +up-to-date without having to rely on manual updates or requests to trigger the update process. + +### Key Functions Using Background Scheduler + +1. `update_tags_in_db` Function: + - This function fetches the latest tags from the Stack Overflow API and updates the database with the new data. + - It is an asynchronous function, allowing it to run in the background without blocking other operations of the + application. + +2. `start_scheduler` Function: + - This function is responsible for initializing the BackgroundScheduler and scheduling the update_tags_in_db + function to run every 24 hours. + - It sets the trigger type to 'interval', which means the job will run at fixed intervals, specifically every 24 + hours in this case. + - The scheduler runs as a background process, meaning the main application does not need to stop or wait for it to + complete. + +#### How It Works in the Application + +- **Scheduled Update**: The scheduler runs automatically upon the application's startup and ensures that the Stack + Overflow tags are updated every 24 hours without user intervention. +- **Non-Blocking**: Since the scheduler runs in the background, the main FastAPI application is free to handle user + requests and other processes without being blocked by this scheduled task. +- **Database Update**: Every 24 hours, the `update_tags_in_db` function fetches the latest tags from the Stack Overflow + API and updates the database with the new tag data, replacing the older entries. + +```python +# Starts the scheduler to update Stack Overflow tags every 24 hours +def start_scheduler(): + scheduler = BackgroundScheduler() + + # Add a job to run the update_tags_in_db function every 24 hours + scheduler.add_job(lambda: asyncio.run(update_tags_in_db()), 'interval', hours=24) + + # Start the scheduler in the background + scheduler.start() +``` + +### Advantages of Using APScheduler in This Route: + +- **Automation**: The update process is fully automated, so no manual triggers are needed to keep the data fresh. +- **Efficiency**: Since the data is updated every 24 hours, API calls are limited, reducing the risk of hitting the + quota limit for the Stack Overflow API. +- **Scalability**: As the application scales, the scheduler will continue to handle periodic updates without additional + complexity. +- **Separation of Concerns**: The main application remains focused on handling user requests, while the scheduler + manages the background task of updating the database. + +### Other Use Cases of Background Scheduler in FastAPI: + +- **Sending Automated Emails**: You can use APScheduler to schedule email reminders or notifications based on user + actions or time intervals. +- **Daily Data Backups**: Schedule regular backups of your database to prevent data loss. +- **Periodic Cache Refreshing**: In applications that rely on external data sources, background schedulers can + periodically refresh the cache to ensure the data is always fresh and up-to-date. +- **Maintenance Tasks**: Tasks such as clearing old logs, cleaning up expired sessions, or recalculating metrics can be + scheduled during off-peak hours. + +## Conclusion + +A **Background Scheduler** is a powerful tool for handling automated, periodic tasks within your applications. In the +context of your FastAPI application, **APScheduler** is used to periodically update data from the Stack Overflow API, +ensuring the system stays current without manual intervention. This enables efficient, scalable, and automated +management of tasks that need to be performed regularly in the background. \ No newline at end of file