Skip to content

krickert/search-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search API

License Java gRPC

Overview

The Search API is a high-performance gRPC-based service designed exclusively for executing advanced search queries over indexed documents in Apache Solr. This API leverages semantic vector embeddings to enhance search accuracy and relevance, making it ideal for applications requiring intelligent and context-aware document retrieval.

Note: Document indexing is handled by a separate service. This API focuses solely on search functionalities.

Features

  • gRPC-Based Communication: High-performance, low-latency communication using gRPC.
  • Semantic Vector Search: Utilize vector embeddings for semantic understanding and accurate search results.
  • Keyword Search: Traditional keyword-based search capabilities.
  • Caching Mechanism: In-memory caching of vector embeddings to optimize performance and reduce redundant calculations, with persistence to disk.
  • Flexible Search Strategies: Combine semantic and keyword search strategies using logical operators.
  • Highlighting: Highlight matched terms and snippets in search results.
  • Faceted Search: Advanced faceting options for refined search results.

Technology Stack

  • Programming Language: Java 11+
  • Framework: Micronaut
  • Search Engine: Apache Solr (Version 9.7.0)
  • Communication Protocol: gRPC
  • Containerization: Docker
  • Testing: JUnit 5, Testcontainers
  • Build Tool: Maven

Getting Started

Prerequisites

  • Java: Ensure Java 17 or higher is installed.
  • Docker: Required for running containerized services.
  • Maven: For building the project.

Installation

  1. Clone the Repository:
    git clone https://github.com/krickert/search-api.git
    cd search-api
    ./mvnw install package
    

Search API Requirements

Overview

The Search API is designed to provide a comprehensive search functionality over indexed documents in a Solr-based search engine. The API will support both keyword and semantic searches and will integrate with an Embedding Service for vector-based querying.

Functional Requirements

1. Document Indexing

  • Inline Vectors: Support indexing where the vector representation is included within the same document.
  • Embedded Documents: Allow chunking of fields and embedding documents within the main document.
  • Outside Join: Index chunked fields into a new collection for advanced queries.

2. Search Functionality

The API will support various query types to enhance search capabilities:

  • Semantic Matching: Utilize vector-based search for retrieving documents that are semantically similar to the query.
  • Keyword Matching: Perform traditional keyword-based search queries.
  • Keyword with Semantic Boost: Combine keyword search with an additional boost from semantic vectors.
  • Semantic with Keyword Boost: Boost semantic search results using keyword matching.

3. Query Configuration

  • Dynamic Configuration: Allow dynamic configuration of Solr collections and query parameters.
  • Vector Configuration: Support inline, embedded, or external collection configurations for vector fields.

4. Query Execution

  • Support for Filter Queries: Implement filtering (fq) for refined search results.
  • Highlighting: Use Solr's highlighting capabilities to highlight matched snippets in search results.
  • Matched Snippets: Return matched snippets from either chunked or inline text.

5. Vector Embedding

  • Embedding Service Integration: Integrate with an EmbeddingService to generate embeddings for text.
  • GRPC Protocol: Use gRPC for communication with the embedding service.
  • Caching: Implement caching for embeddings to minimize redundant calculations. The cache should:
    • Be in-memory and shared between tests.
    • Persist to disk at the end of the execution.
    • Use document IDs as keys for vector retrieval.

6. Performance

  • Scalability: Ensure the API can handle a large number of requests and documents efficiently.
  • Asynchronous Processing: Support asynchronous processing for queries and embedding generation.

Non-Functional Requirements

1. Security

  • Authentication: Implement secure authentication mechanisms for accessing the API.
  • Data Protection: Ensure data integrity and protection during transmission.

2. Logging and Monitoring

  • Logging: Implement logging for debugging and monitoring purposes.
  • Metrics: Collect metrics for API usage, performance, and error rates.

3. Documentation

  • API Documentation: Provide clear and comprehensive documentation for all API endpoints and usage examples.
  • Developer Guides: Include developer guides for setup, configuration, and integration.

Technical Requirements

1. Technology Stack

  • Search Engine: Apache Solr (Version 9.7.0)
  • Programming Language: Java
  • Frameworks: Micronaut for building the API.
  • Containerization: Use Docker for service containerization.
  • gRPC: For communication with the embedding service.

2. Environment Setup

  • Development Environment: Ensure a local development setup for testing and debugging.
  • Test Containers: Utilize Testcontainers for integration testing with Solr.

Testing Requirements

1. Unit Testing

  • Component Tests: Ensure that each component of the API is unit tested for functionality.

2. Integration Testing

  • End-to-End Tests: Implement integration tests that cover end-to-end scenarios, including indexing, querying, and caching.

3. Performance Testing

  • Load Testing: Conduct load testing to ensure the API can handle expected traffic.

Future Enhancements

  • User Feedback Integration: Gather user feedback to improve search relevance and API usability.
  • Advanced Query Features: Explore advanced query features like faceting, sorting, and recommendation systems.

Conclusion

This document outlines the comprehensive requirements for the development of the Search API. The focus is on delivering a robust, efficient, and user-friendly API that leverages the power of Solr and embedding technologies for enhanced search capabilities.

Below is a set of requirements

Micronaut 4.6.2 Documentation


Feature hamcrest documentation

Feature openapi-explorer documentation

Feature micronaut-aot documentation

Feature test-resources documentation

Feature testcontainers documentation

Feature validation documentation

Feature annotation-api documentation

Feature security-oauth2 documentation

Feature openapi documentation

Feature mockito documentation

Feature http-client documentation

Feature management documentation

Feature swagger-ui documentation

Feature awaitility documentation

Feature maven-enforcer-plugin documentation

Feature assertj documentation

About

Search API for the vector-based search engine ecosystem

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published