Skip to content

Commit

Permalink
add readme documentation about background scheduler
Browse files Browse the repository at this point in the history
  • Loading branch information
danilojezernik committed Sep 9, 2024
1 parent 6766ece commit 921baa9
Showing 1 changed file with 145 additions and 0 deletions.
145 changes: 145 additions & 0 deletions documentation/background-scheduler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Background Scheduler Documentation

## What is a Background Scheduler?

A Background Scheduler is a tool that allows you to run tasks or functions at specific intervals, asynchronously,
without blocking the main application process. It is often used to perform background tasks that need to run
periodically, such as data synchronization, sending notifications, running maintenance tasks, or cleaning up resources.

**APScheduler** (Advanced Python Scheduler) is a popular Python library that provides powerful functionality for
scheduling jobs in the background. It supports various job stores, execution options, and flexibility in scheduling.

## How Background Scheduler is Typically Used

1. **Running Periodic Tasks**: Background schedulers are commonly used for tasks that need to be executed at regular
intervals, like cleaning up expired data from a database or updating a cache.
2. **Scheduled Notifications**: Sending scheduled emails, reminders, or push notifications at set intervals or times.
3. **Data Syncing**: Automatically fetching or syncing data from external services (e.g., APIs) at regular intervals to
ensure fresh and up-to-date data.
4. **Database Maintenance**: Running database cleanup jobs, backup tasks, or archiving operations periodically.

## APScheduler Components

APScheduler works by scheduling jobs and running them either once or on a repeating schedule. Here are some of the key
components:

- **Job**: A task or function that you want to schedule. This can be a function or method.
- **Trigger**: Specifies when a job should be executed. Commonly used triggers include `interval` (to run periodically),
`date` (to run once at a specific time), and `cron` (to run based on a cron expression).
- **Job Store**: Where the job metadata is stored (in-memory, file, or a database).
- **Executor**: Specifies how the job should be executed (e.g., in a thread or in a process).

### Basic Example of Using APScheduler

```python
from apscheduler.schedulers.background import BackgroundScheduler
import time


# A simple function to be scheduled
def job():
print("Running scheduled task")


# Create a scheduler instance
scheduler = BackgroundScheduler()

# Add a job to run every 10 seconds
scheduler.add_job(job, 'interval', seconds=10)

# Start the scheduler in the background
scheduler.start()

# Keep the script running to allow the scheduler to run in the background
try:
while True:
time.sleep(2)
except (KeyboardInterrupt, SystemExit):
scheduler.shutdown() # Shut down the scheduler when exiting the app
```

In this example:

- The `job` function will be executed every 10 seconds without blocking the main program.
- The `BackgroundScheduler` allows the scheduler to run in the background while the main application continues to
operate.

### Common Use Cases for Background Schedulers

- **Web Applications**: Running tasks in the background to offload work from the main application, such as updating
data, sending notifications, or managing queues.
- **Data Pipelines**: Running periodic data aggregation or transformation tasks in the background without user
intervention.
- **Monitoring Systems**: Scheduling tasks to monitor services or systems and send alerts or take action when certain
conditions are met.
- **Maintenance Jobs**: Scheduling cleanup tasks, backups, or other administrative tasks that need to run regularly
without user interaction.

## How Background Scheduler is Used in This Route

In the provided route, the **APScheduler** is used to automate the updating of programming language tags from the Stack
Overflow API at regular intervals (every 24 hours). This approach ensures that the data in the database remains
up-to-date without having to rely on manual updates or requests to trigger the update process.

### Key Functions Using Background Scheduler

1. `update_tags_in_db` Function:
- This function fetches the latest tags from the Stack Overflow API and updates the database with the new data.
- It is an asynchronous function, allowing it to run in the background without blocking other operations of the
application.

2. `start_scheduler` Function:
- This function is responsible for initializing the BackgroundScheduler and scheduling the update_tags_in_db
function to run every 24 hours.
- It sets the trigger type to 'interval', which means the job will run at fixed intervals, specifically every 24
hours in this case.
- The scheduler runs as a background process, meaning the main application does not need to stop or wait for it to
complete.

#### How It Works in the Application

- **Scheduled Update**: The scheduler runs automatically upon the application's startup and ensures that the Stack
Overflow tags are updated every 24 hours without user intervention.
- **Non-Blocking**: Since the scheduler runs in the background, the main FastAPI application is free to handle user
requests and other processes without being blocked by this scheduled task.
- **Database Update**: Every 24 hours, the `update_tags_in_db` function fetches the latest tags from the Stack Overflow
API and updates the database with the new tag data, replacing the older entries.

```python
# Starts the scheduler to update Stack Overflow tags every 24 hours
def start_scheduler():
scheduler = BackgroundScheduler()

# Add a job to run the update_tags_in_db function every 24 hours
scheduler.add_job(lambda: asyncio.run(update_tags_in_db()), 'interval', hours=24)

# Start the scheduler in the background
scheduler.start()
```

### Advantages of Using APScheduler in This Route:

- **Automation**: The update process is fully automated, so no manual triggers are needed to keep the data fresh.
- **Efficiency**: Since the data is updated every 24 hours, API calls are limited, reducing the risk of hitting the
quota limit for the Stack Overflow API.
- **Scalability**: As the application scales, the scheduler will continue to handle periodic updates without additional
complexity.
- **Separation of Concerns**: The main application remains focused on handling user requests, while the scheduler
manages the background task of updating the database.

### Other Use Cases of Background Scheduler in FastAPI:

- **Sending Automated Emails**: You can use APScheduler to schedule email reminders or notifications based on user
actions or time intervals.
- **Daily Data Backups**: Schedule regular backups of your database to prevent data loss.
- **Periodic Cache Refreshing**: In applications that rely on external data sources, background schedulers can
periodically refresh the cache to ensure the data is always fresh and up-to-date.
- **Maintenance Tasks**: Tasks such as clearing old logs, cleaning up expired sessions, or recalculating metrics can be
scheduled during off-peak hours.

## Conclusion

A **Background Scheduler** is a powerful tool for handling automated, periodic tasks within your applications. In the
context of your FastAPI application, **APScheduler** is used to periodically update data from the Stack Overflow API,
ensuring the system stays current without manual intervention. This enables efficient, scalable, and automated
management of tasks that need to be performed regularly in the background.

0 comments on commit 921baa9

Please sign in to comment.