diff --git a/README.md b/README.md index bfcd26d0..6ec33eee 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,7 @@ BGE (BAAI General Embedding) focuses on retrieval-augmented LLMs, consisting of ## News +- 05/12/2024: :book: We built the BGE documentation for centralized BGE information and materials. - 10/29/2024: :earth_asia: We created WeChat group for BGE. Scan the [QR code](./imgs/BGE_WeChat_Group.png) to join the group chat! To get the first hand message about our updates and new release, or having any questions or ideas, join us now! - bge_wechat_group @@ -109,16 +110,16 @@ Clone the repository and install ``` git clone https://github.com/FlagOpen/FlagEmbedding.git cd FlagEmbedding -# If you do not want to finetune the models, you can install the package without the finetune dependency: +# If you do not need to finetune the models, you can install the package without the finetune dependency: pip install . -# If you want to finetune the models, you can install the package with the finetune dependency: +# If you want to finetune the models, install the package with the finetune dependency: # pip install .[finetune] ``` For development in editable mode: ``` -# If you do not want to finetune the models, you can install the package without the finetune dependency: +# If you do not need to finetune the models, you can install the package without the finetune dependency: pip install -e . -# If you want to finetune the models, you can install the package with the finetune dependency: +# If you want to finetune the models, install the package with the finetune dependency: # pip install -e .[finetune] ``` diff --git a/docs/source/API/evaluation/beir/data_loader.rst b/docs/source/API/evaluation/beir/data_loader.rst index de224fa1..48a7aeab 100644 --- a/docs/source/API/evaluation/beir/data_loader.rst +++ b/docs/source/API/evaluation/beir/data_loader.rst @@ -1,4 +1,4 @@ data loader =========== -.. autoclass:: FlagEmbedding.abc.evaluation.BEIREvalDataLoader \ No newline at end of file +.. autoclass:: FlagEmbedding.evaluation.bier.BEIREvalDataLoader \ No newline at end of file diff --git a/docs/source/FAQ/index.rst b/docs/source/FAQ/index.rst index 7c81a820..ffbc386a 100644 --- a/docs/source/FAQ/index.rst +++ b/docs/source/FAQ/index.rst @@ -5,33 +5,46 @@ Below are some commonly asked questions. .. tip:: - For more questions, search issues on GitHub or join our community! + For more questions, search in issues on GitHub or join our community! +.. dropdown:: Having network issue when connecting to Hugging Face? + :animate: fade-in-slide-down + + Try to set the :code:`HF_ENDPOINT` to `HF mirror `_ instead. + + .. code:: bash + + export HF_ENDPOINT=https://hf-mirror.com .. dropdown:: When does the query instruction need to be used? + :animate: fade-in-slide-down For a retrieval task that uses short queries to find long related documents, it is recommended to add instructions for these short queries. The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task. In all cases, the documents/passages do not need to add the instruction. .. dropdown:: Why it takes quite long to just encode 1 sentence? + :animate: fade-in-slide-down Note that if you have multiple CUDA GPUs, FlagEmbedding will automatically use all of them. Then the time used to start the multi-process will cost way longer than the actual encoding. Try to just use CPU or just single GPU for simple tasks. .. dropdown:: The embedding results are different for CPU and GPU? + :animate: fade-in-slide-down The encode function will use FP16 by default if GPU is available, which leads to different precision. Set :code:`fp16=False` to get full precision. .. dropdown:: How many languages do the multi-lingual models support? + :animate: fade-in-slide-down The training datasets cover up to 170+ languages. But note that due to the unbalanced distribution of languages, the performances will be different. Please further test refer to the real application scenario. .. dropdown:: How does the different retrieval method works in bge-m3? + :animate: fade-in-slide-down - Dense retrieval: map the text into a single embedding, e.g., `DPR `_, `BGE-v1.5 <../bge/bge_v1_v1.5>`_ - Sparse retrieval (lexical matching): a vector of size equal to the vocabulary, with the majority of positions set to zero, calculating a weight only for tokens present in the text. @@ -39,5 +52,11 @@ Below are some commonly asked questions. - Multi-vector retrieval: use multiple vectors to represent a text, e.g., `ColBERT `_. .. dropdown:: Recommended vector database? + :animate: fade-in-slide-down + + Generally you can use any vector database (open-sourced, commercial). We use `Faiss `_ by default in our evaluation pipeline and tutorials. + +.. dropdown:: No enough VRAM or OOM error during evaluation? + :animate: fade-in-slide-down - Generally you can use any vector database (open-sourced, commercial). We use `Faiss `_ by default in our evaluation pipeline and tutorials. \ No newline at end of file + The default values of :code:`embedder_batch_size` and :code:`reranker_batch_size` are both 3000. Try a smaller value. diff --git a/docs/source/Introduction/index.rst b/docs/source/Introduction/index.rst index 1e83aa27..e20e021d 100644 --- a/docs/source/Introduction/index.rst +++ b/docs/source/Introduction/index.rst @@ -7,14 +7,22 @@ BGE builds one-stop retrieval toolkit for search and RAG. We provide inference, :width: 700 :align: center - BGE embedder and reranker in an RAG pipelin. `Source `_ + BGE embedder and reranker in an RAG pipeline. `Source `_ Quickly get started with: .. toctree:: :maxdepth: 1 + :caption: Start + overview installation quick_start - concept + + +.. toctree:: + :maxdepth: 1 + :caption: Concept + + model retrieval_demo \ No newline at end of file diff --git a/docs/source/Introduction/installation.rst b/docs/source/Introduction/installation.rst index d54739b9..a4f3029b 100644 --- a/docs/source/Introduction/installation.rst +++ b/docs/source/Introduction/installation.rst @@ -6,7 +6,7 @@ Installation Using pip: ---------- -If you do not want to finetune the models, you can install the package without the finetune dependency: +If you do not need to finetune the models, you can install the package without the finetune dependency: .. code:: bash @@ -28,18 +28,18 @@ Clone the repository and install git clone https://github.com/FlagOpen/FlagEmbedding.git cd FlagEmbedding - # If you do not want to finetune the models, you can install the package without the finetune dependency: + # If you do not need to finetune the models, you can install the package without the finetune dependency: pip install . - # If you want to finetune the models, you can install the package with the finetune dependency: + # If you want to finetune the models, install the package with the finetune dependency: pip install .[finetune] For development in editable mode: .. code:: bash - # If you do not want to finetune the models, you can install the package without the finetune dependency: + # If you do not need to finetune the models, you can install the package without the finetune dependency: pip install -e . - # If you want to finetune the models, you can install the package with the finetune dependency: + # If you want to finetune the models, install the package with the finetune dependency: pip install -e .[finetune] PyTorch-CUDA diff --git a/docs/source/Introduction/concept.rst b/docs/source/Introduction/model.rst similarity index 85% rename from docs/source/Introduction/concept.rst rename to docs/source/Introduction/model.rst index 616a1e14..295171f7 100644 --- a/docs/source/Introduction/concept.rst +++ b/docs/source/Introduction/model.rst @@ -1,10 +1,12 @@ -Concept -======= +Model +===== + +If you are already familiar with the concepts, take a look at the :doc:`BGE models <../bge/index>`! Embedder -------- -Embedder, or embedding model, is a model designed to convert data, usually text, codes, or images, into sparse or dense numerical vectors (embeddings) in a high dimensional vector space. +Embedder, or embedding model, bi-encoder, is a model designed to convert data, usually text, codes, or images, into sparse or dense numerical vectors (embeddings) in a high dimensional vector space. These embeddings capture the semantic meaning or key features of the input, which enable efficient comparison and analysis. A very famous demonstration is the example from `word2vec `_. It shows how word embeddings capture semantic relationships through vector arithmetic: diff --git a/docs/source/Introduction/overview.rst b/docs/source/Introduction/overview.rst new file mode 100644 index 00000000..391185ec --- /dev/null +++ b/docs/source/Introduction/overview.rst @@ -0,0 +1,13 @@ +Overview +======== + +Our repository provides well-structured `APIs `_ for the inference, evaluation, and fine-tuning of BGE series models. +Besides that, there are abundant resources of `tutorials `_ and `examples `_ for users to quickly get a hands-on experience. + +.. figure:: https://raw.githubusercontent.com/FlagOpen/FlagEmbedding/refs/heads/master/imgs/projects.png + :width: 700 + :align: center + + Structure of contents in our `repo `_ + +Our repository provides well-structured contents \ No newline at end of file diff --git a/docs/source/Introduction/quick_start.rst b/docs/source/Introduction/quick_start.rst index d5a064b7..750f3154 100644 --- a/docs/source/Introduction/quick_start.rst +++ b/docs/source/Introduction/quick_start.rst @@ -7,9 +7,7 @@ First, load one of the BGE embedding model: from FlagEmbedding import FlagAutoModel - model = FlagAutoModel.from_finetuned('BAAI/bge-base-en-v1.5', - query_instruction_for_retrieval="Represent this sentence for searching relevant passages:", - use_fp16=True) + model = FlagAutoModel.from_finetuned('BAAI/bge-base-en-v1.5') .. tip:: @@ -22,6 +20,7 @@ First, load one of the BGE embedding model: Then, feed some sentences to the model and get their embeddings: .. code:: python + sentences_1 = ["I love NLP", "I love machine learning"] sentences_2 = ["I love BGE", "I love text retrieval"] embeddings_1 = model.encode(sentences_1) diff --git a/docs/source/bge/bge_m3.rst b/docs/source/bge/bge_m3.rst index 9ae05bc4..2cfe5941 100644 --- a/docs/source/bge/bge_m3.rst +++ b/docs/source/bge/bge_m3.rst @@ -119,5 +119,5 @@ Usage Useful Links: `API <../API/inference/embedder/encoder_only/M3Embedder>`_ -`Tutorial <>`_ +`Tutorial `_ `Example `_ \ No newline at end of file diff --git a/docs/source/bge/bge_reranker.rst b/docs/source/bge/bge_reranker.rst index b25a52fb..4c79dc52 100644 --- a/docs/source/bge/bge_reranker.rst +++ b/docs/source/bge/bge_reranker.rst @@ -1,2 +1,37 @@ BGE-Reranker ============ + +Different from embedding model, reranker, or cross-encoder uses question and document as input and directly output similarity instead of embedding. +To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. +For examples, use a bge embedding model to first retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. + +The first series of BGE-Reranker contains two models, large and base. + ++-------------------------------------------------------------------------------+-----------------------+------------+--------------+-----------------------------------------------------------------------+ +| Model | Language | Parameters | Model Size | Description | ++===============================================================================+=======================+============+==============+=======================================================================+ +| `BAAI/bge-reranker-large `_ | English & Chinese | 560M | 2.24 GB | Larger reranker model, easy to deploy with better inference | ++-------------------------------------------------------------------------------+-----------------------+------------+--------------+-----------------------------------------------------------------------+ +| `BAAI/bge-reranker-base `_ | English & Chinese | 278M | 1.11 GB | Lightweight reranker model, easy to deploy with fast inference | ++-------------------------------------------------------------------------------+-----------------------+------------+--------------+-----------------------------------------------------------------------+ + +bge-reranker-large and bge-reranker-base used `XLM-RoBERTa-Large `_ and `XLM-RoBERTa-Base `_ respectively as the base model. +They were trained on high quality English and Chinese data, and acheived State-of-The-Art performance in the level of same size models at the time released. + +Usage +----- + + +.. code:: python + + from FlagEmbedding import FlagReranker + + reranker = FlagReranker( + 'BAAI/bge-reranker-base', + query_max_length=256, + use_fp16=True, + devices=['cuda:1'], + ) + + score = reranker.compute_score(['I am happy to help', 'Assisting you is my pleasure']) + print(score) \ No newline at end of file diff --git a/docs/source/bge/bge_reranker_v2.rst b/docs/source/bge/bge_reranker_v2.rst new file mode 100644 index 00000000..abb477f2 --- /dev/null +++ b/docs/source/bge/bge_reranker_v2.rst @@ -0,0 +1,82 @@ +BGE-Reranker-v2 +=============== + ++------------------------------------------------------------------------------------------------------------------+-----------------------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Model | Language | Parameters | Model Size | Description | ++==================================================================================================================+=======================+=============+==============+=========================================================================================================================================================+ +| `BAAI/bge-reranker-v2-m3 `_ | Multilingual | 568M | 2.27 GB | Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference. | ++------------------------------------------------------------------------------------------------------------------+-----------------------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ +| `BAAI/bge-reranker-v2-gemma `_ | Multilingual | 2.51B | 10 GB | Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities. | ++------------------------------------------------------------------------------------------------------------------+-----------------------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ +| `BAAI/bge-reranker-v2-minicpm-layerwise `_ | Multilingual | 2.72B | 10.9 GB | Suitable for multilingual contexts, allows freedom to select layers for output, facilitating accelerated inference. | ++------------------------------------------------------------------------------------------------------------------+-----------------------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ +| `BAAI/bge-reranker-v2.5-gemma2-lightweight `_ | Multilingual | 2.72B | 10.9 GB | Suitable for multilingual contexts, allows freedom to select layers, compress ratio and compress layers for output, facilitating accelerated inference. | ++------------------------------------------------------------------------------------------------------------------+-----------------------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + + +.. tip:: Suggessions on model selection + + You can select the model according your senario and resource: + + - For multilingual, utilize :code:`BAAI/bge-reranker-v2-m3`, :code:`BAAI/bge-reranker-v2-gemma` and :code:`BAAI/bge-reranker-v2.5-gemma2-lightweight`. + - For Chinese or English, utilize :code:`BAAI/bge-reranker-v2-m3` and :code:`BAAI/bge-reranker-v2-minicpm-layerwise`. + - For efficiency, utilize :code:`BAAI/bge-reranker-v2-m3` and the low layer of :code:`BAAI/bge-reranker-v2-minicpm-layerwise`. + - For better performance, recommand :code:`BAAI/bge-reranker-v2-minicpm-layerwise` and :code:`BAAI/bge-reranker-v2-gemma`. + + Make sure always test on your real use case and choose the one with best speed-quality balance! + +Usage +----- + +Use bge-reranker-v2-m3 in the same way as bge-reranker-base and bge-reranker-large. + +.. code:: python + + from FlagEmbedding import FlagReranker + + # Setting use_fp16 to True speeds up computation with a slight performance degradation + reranker = FlagReranker('BAAI/bge-reranker-v2-m3', use_fp16=True) + + score = reranker.compute_score(['query', 'passage']) + # or set "normalize=True" to apply a sigmoid function to the score for 0-1 range + score = reranker.compute_score(['query', 'passage'], normalize=True) + + print(score) + +Use the :code:`FlagLLMReranker` class for bge-reranker-v2-gemma. + +.. code:: python + + from FlagEmbedding import FlagLLMReranker + + # Setting use_fp16 to True speeds up computation with a slight performance degradation + reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', use_fp16=True) + + score = reranker.compute_score(['query', 'passage']) + print(score) + +Use the :code:`LayerWiseFlagLLMReranker` class for bge-reranker-v2-minicpm-layerwise. + +.. code:: python + + from FlagEmbedding import LayerWiseFlagLLMReranker + + # Setting use_fp16 to True speeds up computation with a slight performance degradation + reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_fp16=True) + + # Adjusting 'cutoff_layers' to pick which layers are used for computing the score. + score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) + print(score) + +Use the :code:`LightWeightFlagLLMReranker` class for bge-reranker-v2.5-gemma2-lightweight. + +.. code:: python + + from FlagEmbedding import LightWeightFlagLLMReranker + + # Setting use_fp16 to True speeds up computation with a slight performance degradation + reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', use_fp16=True) + + # Adjusting 'cutoff_layers' to pick which layers are used for computing the score. + score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layer=[24, 40]) + print(score) \ No newline at end of file diff --git a/docs/source/bge/bge_v1_v1.5.rst b/docs/source/bge/bge_v1_v1.5.rst index f99dd1d5..80b6a416 100644 --- a/docs/source/bge/bge_v1_v1.5.rst +++ b/docs/source/bge/bge_v1_v1.5.rst @@ -89,6 +89,7 @@ To use a single GPU: model = FlagModel('BAAI/bge-base-en-v1.5', devices=0) | + Useful Links: `API <../API/inference/embedder/encoder_only/BaseEmbedder>`_ diff --git a/docs/source/bge/index.rst b/docs/source/bge/index.rst index 76791b3f..a6e5dd67 100644 --- a/docs/source/bge/index.rst +++ b/docs/source/bge/index.rst @@ -11,3 +11,9 @@ BGE bge_m3 bge_icl +.. toctree:: + :maxdepth: 1 + :caption: Reranker + + bge_reranker + bge_reranker_v2 \ No newline at end of file