Document Compass is an open-source platform that helps organizations intelligently organize, discover, and utilize their documents through AI-powered similarity matching and smart grouping. Built with both enterprise and nonprofit use cases in mind, it specifically addresses challenges in low-bandwidth environments and offers integration with popular cloud storage providers.
- Enable intelligent document discovery across large document collections
- Reduce time spent searching for related documents by 70%
- Make document management accessible in low-bandwidth environments
- Provide actionable insights through document summarization and grouping
- Integrate seamlessly with existing cloud storage solutions
- Nonprofits managing program documentation
- Organizations with distributed teams
- Educational institutions organizing learning materials
- Research teams managing related papers and studies
- Any team struggling with document discovery and organization
- Smart Document Grouping: Automatically identify and group similar documents
- Intelligent Summarization: Generate concise summaries at multiple detail levels
- Low-Bandwidth Optimization: Compressed previews and progressive loading
- Cloud Storage Integration: Native support for Google Drive and Dropbox
- Flexible Search: Find documents by content, metadata, or similarity
- Machine learning-powered similarity detection
- Efficient document vectorization and indexing
- Scalable architecture supporting millions of documents
- REST API for easy integration
- Containerized deployment for simple scaling
- Python 3.9+
- FastAPI for REST API
- Sentence Transformers for document embedding
- PostgreSQL for metadata storage
- Redis for caching
- React 18+
- Next.js for server-side rendering
- TailwindCSS for styling
- ShadcnUI for components
- Docker for containerization
- GitHub Actions for CI/CD
- Fly.io for deployment
- MinIO for object storage
# Backend
Python 3.9+
PostgreSQL 13+
Redis 6+
# Frontend
Node.js 18+
npm 8+
# Infrastructure
Docker 20.10+
docker-compose 2.0+
# Clone the repository
git clone https://github.com/opportunity-hack/document-compass.git
# Install dependencies
cd document-compass
pip install -r requirements.txt
cd packages/interface && npm install
# Set up environment
cp .env.example .env
# Edit .env with your configurations
# Start development environment
docker-compose up -d
# Run migrations
python manage.py migrate
# Start backend
python manage.py runserver
# Start frontend (new terminal)
cd packages/interface && npm run dev
document-compass/
├── packages/
│ ├── core/ # Core similarity engine
│ ├── navigator/ # Search & grouping
│ ├── api/ # FastAPI application
│ └── interface/ # React frontend
├── docs/ # Documentation
├── examples/ # Usage examples
├── tests/ # Test suites
└── deployment/ # Deployment configs
We welcome contributions! See our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
- 100% test coverage for core functionality
- Type hints for Python code
- ESLint compliance for JavaScript/TypeScript
- Comprehensive documentation
As a user, I would like to be able to upload and/or sync documents within Google Drive. I would like the documents contained within the app to show me which ones are similar to one another and try to group them into folders based on similarity.
- Core similarity engine
- Basic document grouping
- Google Drive integration
- Initial API release
As a product manager, I would like to use either Dropbox or Google Drive - this enables the most common cloud drive platforms to use what we have built. As a user, I would like to have my documents summarized and then easily searched. I would also like to use this application from my mobile device.
- Dropbox integration
- Advanced summarization
- Batch processing
- Mobile-responsive UI
- Enterprise features
- Advanced permission system
- Custom ML model training
- API rate limiting
- Additional storage providers
- Advanced analytics
- Workflow automation
- Enterprise SSO
We track the following metrics to measure project success:
- Document discovery time reduction
- Bandwidth savings
- User engagement with summaries
- Group accuracy rates
- API response times
- Processing speed
- System uptime
- Error rates
- JWT-based authentication
- Role-based access control
- Document encryption at rest
- Regular security audits
- GDPR compliance built-in
This project is licensed under the MIT License - see the LICENSE file for details.
Special thanks to:
- The Opportunity Hack community
- Our open-source contributors
- Organizations providing valuable feedback
- Project Link: https://github.com/opportunity-hack/document-compass
- Discussion Forum: GitHub Discussions
- Issue Tracker: GitHub Issues
Made with ❤️ by the Opportunity Hack Team