Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
langchain_huggingface.py		langchain_huggingface.py
openai_embedding.py		openai_embedding.py

README.md

Embedding

Convert unstructured data to embeddings, which will be stored in VectorStore. Vector similarities can be used to achieve semantic search.

TextEncoder

The TextEncoder converts text inputs to embeddings. To adapt LangChain agent and vector store, this module must be a LangChain Embeddings.

Configuration

By default, it uses a modified HuggingFaceEmbeddings in LangChain. The modified HuggingFace encoder allows to return normalized embedding(s) by the parameter norm. To configure the default encoder, you change values of parameters via config.py.

Modify embedding configs to configure it, like changing models or disable normalization:

# Text embedding
TEXTENCODER_CONFIG = {
    'model': 'multi-qa-mpnet-base-cos-v1',
    'norm': True
}

Usage Example

from text_encoder import TextEncoder

encoder = TextEncoder()

# Generate embeddings for a list of documents
doc_embeddings = encoder.embed_documents(['test'])
# Generate embedding for a text input
query_embedding = encoder.embed_query('test')

Customization

To change embedding method used in operations, you can modify init.py to import a different TextEncoder. For example, changing .langchain_huggingface to .openai_embedding in the init file will switch to OpenAI embedding. In the meantime, don't forget to modify config.py for a changed embedding method. The text encoder should inherit LangChain Embeddings and you can rewrite the methods embed_documents and embed_query to customize the module:

from langchain.embeddings.base import Embeddings


def test_encoder(text: str):
    return [0., 0.]

class TextEncoder(Embeddings):
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed search docs."""
        outputs = []
        for t in texts:
            embed = test_encoder(t)
            outputs.append(embed)
        return outputs

    def embed_query(self, text: str) -> List[float]:
        """Embed query text."""
        return test_encoder(text)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedding

embedding

README.md

Embedding

TextEncoder

Configuration

Usage Example

Customization

Files

embedding

Directory actions

More options

Directory actions

More options

Latest commit

History

embedding

Folders and files

parent directory

README.md

Embedding

TextEncoder

Configuration

Usage Example

Customization