Email Flow Pipeline

Overview

This pipeline is designed to automatically parse emails for relevant financial transaction data and save the extracted information into a PostgreSQL database. It includes a machine learning (ML) component that classifies transactions automatically. Additionally, it provides a mechanism for manual re-classification of transactions if the ML model misclassifies them. Future enhancements include automated model retraining and a live dashboard.

Features

Real-Time Email Parsing: Automatically parses incoming emails for financial transaction data.
Data Storage: Saves parsed transaction data into a PostgreSQL database.
ML Classification: Classifies transactions using a machine learning model.
Manual Re-Classification: Allows manual re-classification of transactions.
Future Enhancements:
- Automated model retraining as new data comes in.
- Live dashboard with a connection to the PostgreSQL database.

Components

parse_email
- Monitors email inbox for new messages.
- Extracts relevant financial transaction data.
- Pushes to transactions Kafka topic.
classify_transactions
- Uses a Recurrent Neural Network to classify incoming transactions.
- Pushes classified transaction to classified_transactions topic.
- Sends google form to user to reclassify the transaction if necessary.
process_transaction_categorization_form_submission
- Processes google form submissions for manual re-classifications and updates the corresponding database record.
log_transaction_form_submission
- Enables user to manually log transactions not automatically captured by the pipeline.
write_to_db
- Sends classified transaction data to Postgres database.
Data Storage
- PostgreSQL database to store transaction data.
- Schema designed to efficiently store and query financial transactions.
ML Classifier
- Recurrent Neural Network to classify transactions.
- Model trained on historical transaction data.
- Classification results are saved to Postgres database

Deployment

email-pipeline and email-pipeline-ml Docker images contain the core codebase and various entrypoints to carry out specific tasks for different parts of the pipeline.
Uploaded to Google's Artifact Registry and deployed on Cloud Run services triggered via PubSub topics.

Next Steps

Automated Model Retraining
- Implement a mechanism to automatically retrain the ML model as new data is added to the database.
Live Dashboard
- Develop a dashboard with a live connection to the PostgreSQL database.
- Visualize transaction data and classification results in real-time.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
classification_model		classification_model
cloud_functions		cloud_functions
email_pipeline		email_pipeline
email_pipeline_ml		email_pipeline_ml
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Flow Pipeline

Overview

Features

Components

Deployment

Next Steps

About

Releases

Packages

Languages

timothyhinh79/email_flow

Folders and files

Latest commit

History

Repository files navigation

Email Flow Pipeline

Overview

Features

Components

Deployment

Next Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages