The HR Matchmaker is an advanced application that automates recruitment screening by matching job applicants with positions using state-of-the-art Natural Language Processing (NLP) and machine learning techniques. The system leverages transformer models and optimized processing for Apple Silicon, providing real-time matching and interactive visualization.
- Dual Model Support
- Transformer-based Model (Default)
- Uses SentenceTransformer with MiniLM architecture
- Optimized for Apple Silicon (M1/M2) using MPS
- Automatic fallback to CPU when needed
- TF-IDF Model (Alternative)
- N-gram analysis (up to trigrams)
- Custom vocabulary size and document frequency thresholds
- Sublinear scaling for better term weighting
- Transformer-based Model (Default)
-
Input Processing
- Automated CSV file monitoring
- Real-time data validation
- Text preprocessing and standardization
-
Model Training
- Dynamic model selection via environment variables
- Automatic retraining on data updates
- Optimized batch processing
- Hardware-specific optimizations
-
Matching Engine
- Cosine similarity computation
- Configurable matching thresholds
- Batch-processed scoring
- Memory-efficient operations
-
Control Panel
- Job position selection
- Minimum match score adjustment
- Maximum applicant display limit
- Real-time job requirement display
-
Visualization
- Interactive match score charts
- Detailed applicant profiles
- Expandable resume sections
- Score-based ranking
-
Transformer Model
- Model: all-MiniLM-L6-v2
- Hardware acceleration support
- Batch processing optimization
- Automatic device selection
-
TF-IDF Model
- N-gram range: (1, 3)
- Max features: 15,000
- Frequency thresholds: min_df=2, max_df=0.85
- Sublinear TF scaling
-
File Monitoring
- Real-time CSV change detection
- Automatic model retraining
- Error handling and logging
-
Data Validation
- Schema validation
- Data completeness checks
- Duplicate detection
- Error reporting
-
Environment Setup
# Create Python 3.11 virtual environment python3.11 -m venv thevenv source thevenv/bin/activate # or `thevenv\Scripts\activate` on Windows
-
Install Dependencies
pip install -r requirements.txt
-
Start the Application
# Set model type (optional) export MODEL_TYPE=transformer # or 'tfidf' # Run the main application python src/main.py
-
Launch Dashboard
streamlit run src/dashboard.py
- Set
MODEL_TYPE
environment variable:transformer
: Use transformer-based model (default)tfidf
: Use TF-IDF based model
- Automatic detection of Apple Silicon
- MPS acceleration when available
- Graceful fallback to CPU processing
transformer_model.py
: Transformer-based matching implementationmodel_training.py
: Model training and evaluation logicdata_preprocessing.py
: Data cleaning and preparationapplicant_screening.py
: Candidate matching logicdashboard.py
: Interactive UI implementationfile_monitor.py
: Data change monitoringvalidation.py
: Data validationconfig.yaml
: System configurationlogger.py
: Logging system
job_posts.csv
: Job requirements and descriptionsapplicants.csv
: Applicant profiles and resumesscreening.log
: System activity logs
- Batch processing for large datasets
- Hardware-specific optimizations
- Memory-efficient operations
- Automatic model versioning
- ML Improvements
- Additional transformer models
- Custom model fine-tuning
- Multi-lingual support
- System Features
- API integration
- Automated report generation
- Advanced analytics dashboard
- Performance
- Distributed processing
- Advanced caching
- Real-time updates
- Monitor
logs/screening.log
- Regular model performance evaluation
- Data quality checks
- System health monitoring
For technical issues:
- Check system logs
- Verify data format
- Review configuration
- Contact system administrator
- Fork repository
- Create feature branch
- Implement changes
- Submit pull request
MIT License