Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing can not be completed (on Windows) #14

Closed
vanetreg opened this issue Jan 5, 2024 · 29 comments
Closed

Indexing can not be completed (on Windows) #14

vanetreg opened this issue Jan 5, 2024 · 29 comments
Labels

Comments

@vanetreg
Copy link

vanetreg commented Jan 5, 2024

I'm testing
01-basic_indexing_and_search.ipynb
on a Windows 10 PC, in Cursor IDE, using Python 3.11.6

Cell:
RAG.index(collection=[full_document], index_name="Miyazaki", max_document_length=180, split_documents=True)
can not be completed after almost an hour!

[Jan 05, 10:46:21] #> Creating directory .ragatouille/colbert\indexes/Miyazaki 
#> Starting...

is shown, I restarted kernel after an hour.

The previous cell, which prints the length of full_document, worked properly.

@bclavie
Copy link
Collaborator

bclavie commented Jan 5, 2024

Hi @vanetreg, sorry about that! I believe this is related to the issue making it not work on Google Colab -- everything is currently multiprocessing even when it doesn't need to be, and it hangs in certain environments (outside __main__ in scripts and in some notebook environments like colab).

I don't have a windows machine to try this, but it might be the Windows + Cursor combo. We'll be looking at fixing this shortly (cc @okhat)

@okhat
Copy link
Collaborator

okhat commented Jan 5, 2024

I think indexing and search should definitely work on colab?

https://colab.research.google.com/github/stanford-futuredata/ColBERT/blob/main/docs/intro2new.ipynb

@bclavie
Copy link
Collaborator

bclavie commented Jan 5, 2024

I did notice that it works on the main repo, but doesn't with RAGatouille, must be how we handle the Run... I need to track down exactly why, but it actually hangs:
https://colab.research.google.com/drive/1S3s_5FUjzjOCxuwRhdfEdrZoa2LcEsME?usp=sharing

@bclavie bclavie added bug Something isn't working multiprocessing_issue labels Jan 5, 2024
@MikeRenwick-ICG
Copy link

Same issue here, same environment (even Cursor!)

@jponline77
Copy link

I was using Windows 11, Cursor, Python 10 through WSL... Worked for me. So, may be a windows not in WSL thing. I gotta say it would be hard for me to imagine not working in WSL on a Windows machine myself at this point.

@vanetreg
Copy link
Author

vanetreg commented Jan 5, 2024

@bclavie @okhat
pls. note I've never referred to (Google) Colab.

I tested again and while executing first cell:

from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

I got:

d:\***\ragatouille-venv\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

so after installing ipywidgets (requirements?!) and restarting Cursor, now without the above written warning,
the index creation cell's run should again be stopped after 10+ mins, having this note:

[Jan 05, 23:16:42] #> Note: Output directory .ragatouille/colbert\indexes/Miyazaki already exists
#> Starting...

After trying to run next cell:

k = 3 # How many documents you want to retrieve, defaults to 10, we set it to 3 here for readability
results = RAG.search(query="What animation studio did Miyazaki found?", k=k)
results

I got error:

NameError                                 Traceback (most recent call last)
[d:\Projects\AI_testing\RAGatouille\01-basic_indexing_and_search.ipynb](file:///D:/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb) Cell 14 line 2
      [1](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#X16sZmlsZQ%3D%3D?line=0) k = 3 # How many documents you want to retrieve, defaults to 10, we set it to 3 here for readability
----> [2](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#X16sZmlsZQ%3D%3D?line=1) results = RAG.search(query="What animation studio did Miyazaki found?", k=k)
      [3](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#X16sZmlsZQ%3D%3D?line=2) results
NameError: name 'RAG' is not defined

I've always checked every each cells execution timestamp, so all previous cells (especially where RAG is defined) run without errors this time.

@bclavie
Copy link
Collaborator

bclavie commented Jan 6, 2024

Hey, thanks for confirming @MikeRenwick-ICG and shining some light on it being a Windows (non-WSL) issue @jponline77.

@vanetreg While you're not using Google Colab, this is definitely the same multiprocessing issue that's causing it to hang in colab. I believe the issue you're seeing is still the same problem -- RAG isn't defined because the previous cell never actually ran and just timed out. The likely issue is identified (#13), I'll ping you when we get a fix out for it!

so after installing ipywidgets (requirements?!)

d:\***\ragatouille-venv\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

This is a bit of an annoying warning, but it doesn't negatively impact running anything. To avoid overloading the lib with dependencies one wouldn't use outside a notebook, we don't generally add ipython/notebook related dependencies to requirements, but definitely do install it if you're going to be running notebooks a lot!

@vanetreg
Copy link
Author

vanetreg commented Jan 6, 2024

@bclavie
don't you use colab and Jupyter notebook expressions interchangeably? :)

Today I tested again this and it really must be Windows related:
after switch on PC, restarting Windows, the first cell, where RAG is defined, dropped the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[d:\Projects\AI_testing\RAGatouille\01-basic_indexing_and_search.ipynb](file:///D:/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb) Cell 2 line 1
----> [1](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#W1sZmlsZQ%3D%3D?line=0) from ragatouille import RAGPretrainedModel
      [3](vscode-notebook-cell:/d%3A/Projects/AI_testing/RAGatouille/01-basic_indexing_and_search.ipynb#W1sZmlsZQ%3D%3D?line=2) RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\__init__.py:2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:2)
      [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:1) __version__ = "0.0.1c"
----> [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:2) from .RAGPretrainedModel import RAGPretrainedModel
      [3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:3) from .RAGTrainer import RAGTrainer
      [5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/__init__.py:5) __all__ = ["RAGPretrainedModel", "RAGTrainer"]

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\RAGPretrainedModel.py:3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:3)
      [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:1) from typing import Callable, Optional, Union
      [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:2) from pathlib import Path
----> [3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:3) from ragatouille.data.corpus_processor import CorpusProcessor
      [4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:4) from ragatouille.data.preprocessors import llama_index_sentence_splitter
      [5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/RAGPretrainedModel.py:5) from ragatouille.models import LateInteractionModel, ColBERT

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\data\__init__.py:1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:1)
----> [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:1) from .corpus_processor import CorpusProcessor
      [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:2) from .preprocessors import llama_index_sentence_splitter
      [3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/__init__.py:3) from .training_data_processor import TrainingDataProcessor

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\data\corpus_processor.py:2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:2)
      [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:1) from typing import Callable, Optional, Union
----> [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:2) from ragatouille.data.preprocessors import llama_index_sentence_splitter
      [5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:5) class CorpusProcessor:
      [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:6)     def __init__(
      [7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:7)         self,
      [8](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:8)         document_splitter_fn: Optional[Callable] = llama_index_sentence_splitter,
      [9](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:9)         preprocessing_fn: Optional[Union[Callable, list[Callable]]] = None,
     [10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/corpus_processor.py:10)     ):

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\ragatouille\data\preprocessors.py:1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:1)
----> [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:1) from llama_index import Document
      [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:2) from llama_index.text_splitter import SentenceSplitter
      [5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/ragatouille/data/preprocessors.py:5) def llama_index_sentence_splitter(documents: list[str], chunk_size=256):

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\__init__.py:13](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:13)
     [10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:10) from typing import Callable, Optional
     [12](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:12) # import global eval handler
---> [13](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:13) from llama_index.callbacks.global_handlers import set_global_handler
     [14](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:14) from llama_index.data_structs.struct_type import IndexStructType
     [16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/__init__.py:16) # embeddings

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\callbacks\__init__.py:7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:7)
      [5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:5) from .open_inference_callback import OpenInferenceCallbackHandler
      [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:6) from .schema import CBEvent, CBEventType, EventPayload
----> [7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:7) from .token_counting import TokenCountingHandler
      [8](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:8) from .utils import trace_method
      [9](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/__init__.py:9) from .wandb_callback import WandbCallbackHandler

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\callbacks\token_counting.py:6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:6)
      [4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:4) from llama_index.callbacks.base_handler import BaseCallbackHandler
      [5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:5) from llama_index.callbacks.schema import CBEventType, EventPayload
----> [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:6) from llama_index.utilities.token_counting import TokenCounter
      [7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:7) from llama_index.utils import get_tokenizer
     [10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:10) @dataclass
     [11](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/callbacks/token_counting.py:11) class TokenCountingEvent:

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\utilities\token_counting.py:6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:6)
      [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:1) # Modified from:
      [2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:2) # https://github.com/nyno-ai/openai-token-counter
      [4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:4) from typing import Any, Callable, Dict, List, Optional
----> [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:6) from llama_index.llms import ChatMessage, MessageRole
      [7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:7) from llama_index.utils import get_tokenizer
     [10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utilities/token_counting.py:10) class TokenCounter:

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\llms\__init__.py:1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:1)
----> [1](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:1) from llama_index.llms.ai21 import AI[2](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:2)1
      2 from llama_index.llms.anthropic import Anthropic
      [3](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/__init__.py:3) from llama_index.llms.anyscale import Anyscale

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\llms\ai21.py:6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:6)
      [4](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:4) from llama_index.callbacks import CallbackManager
      [5](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:5) from llama_index.llms.ai21_utils import ai21_model_to_context_size
----> [6](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:6) from llama_index.llms.base import llm_chat_callback, llm_completion_callback
      [7](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:7) from llama_index.llms.custom import CustomLLM
      [8](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:8) from llama_index.llms.generic_utils import (
      [9](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:9)     completion_to_chat_decorator,
     [10](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:10)     get_from_param_or_env,
     [11](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/ai21.py:11) )

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\llms\base.py:25](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:25)
     [14](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:14) from llama_index.callbacks import CallbackManager, CBEventType, EventPayload
     [15](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:15) from llama_index.llms.types import (
     [16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:16)     ChatMessage,
     [17](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:17)     ChatResponse,
   (...)
     [23](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:23)     LLMMetadata,
     [24](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:24) )
---> [25](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:25) from llama_index.schema import BaseComponent
     [28](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:28) def llm_chat_callback() -> Callable:
     [29](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/llms/base.py:29)     def wrap(f: Callable) -> Callable:

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\schema.py:16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:16)
     [13](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:13) from typing_extensions import Self
     [15](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:15) from llama_index.bridge.pydantic import BaseModel, Field, root_validator
---> [16](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:16) from llama_index.utils import SAMPLE_TEXT, truncate_text
     [18](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:18) if TYPE_CHECKING:
     [19](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/schema.py:19)     from haystack.schema import Document as HaystackDocument

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\utils.py:89](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:89)
     [85](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:85)             self._stopwords = stopwords.words("english")
     [86](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:86)         return self._stopwords
---> [89](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:89) globals_helper = GlobalsHelper()
     [92](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:92) # Global Tokenizer
     [93](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:93) @runtime_checkable
     [94](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:94) class Tokenizer(Protocol):

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\llama_index\utils.py:45](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:45), in GlobalsHelper.__init__(self)
     [43](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:43) def __init__(self) -> None:
     [44](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:44)     """Initialize NLTK stopwords and punkt."""
---> [45](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:45)     import nltk
     [47](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:47)     self._nltk_data_dir = os.environ.get(
     [48](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:48)         "NLTK_DATA",
     [49](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:49)         os.path.join(
   (...)
     [52](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:52)         ),
     [53](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:53)     )
     [55](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/llama_index/utils.py:55)     if self._nltk_data_dir not in nltk.data.path:

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\__init__.py:180](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:180)
    [177](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:177) else:
    [178](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:178)     from nltk import cluster
--> [180](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:180) from nltk.downloader import download, download_shell
    [182](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:182) try:
    [183](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/__init__.py:183)     import tkinter

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\downloader.py:2479](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2479)
   [2469](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2469)             pass
   [2472](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2472) ######################################################################
   [2473](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2473) # Main:
   [2474](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2474) ######################################################################
   (...)
   [2477](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2477) 
   [2478](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2478) # Aliases
-> [2479](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2479) _downloader = Downloader()
   [2480](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2480) download = _downloader.download
   [2483](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:2483) def download_shell():

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\downloader.py:515](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:515), in Downloader.__init__(self, server_index_url, download_dir)
    [513](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:513) # decide where we're going to save things to.
    [514](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:514) if self._download_dir is None:
--> [515](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:515)     self._download_dir = self.default_download_dir()

File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\nltk\downloader.py:1072](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1072), in Downloader.default_download_dir(self)
   [1069](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1069) # Check if we have sufficient permissions to install in a
   [1070](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1070) # variety of system-wide locations.
   [1071](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1071) for nltkdir in nltk.data.path:
-> [1072](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1072)     if os.path.exists(nltkdir) and nltk.internals.is_writable(nltkdir):
   [1073](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1073)         return nltkdir
   [1075](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/nltk/downloader.py:1075) # On Windows, use %APPDATA%

AttributeError: partially initialized module 'nltk' has no attribute 'internals' (most likely due to a circular import)

Note
1075 # On Windows, use %APPDATA%
at the end of error message.

@vanetreg
Copy link
Author

vanetreg commented Jan 6, 2024

I was using Windows 11, Cursor, Python 10 through WSL... Worked for me. So, may be a windows not in WSL thing. I gotta say it would be hard for me to imagine not working in WSL on a Windows machine myself at this point.

@jponline77
I tested it both in VSC and Cursor, in both WSL extension installed.
Maybe Windows version (10 / 11 ) matters?

@runonthespot
Copy link

runonthespot commented Jan 6, 2024

anyone tried this outside of windows Jupyter? I'm keen to drop this in as a direct replacement for single vector RAG

@timothepearce
Copy link

timothepearce commented Jan 6, 2024

Hi @bclavie,

I'm not sure if the problem is related to Colab, I also have an error using Jupyter locally on my Ubuntu server.
The basic readme.md example doesn't work and the cell never finish executing.

Here's the code and stacktrace if that helps:

from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
my_documents = [
    "This is a great excerpt from my wealth of documents",
    "Once upon a time, there was a great document"
]

index_path = RAG.index(index_name="my_index", collection=my_documents)

output the following:

[Jan 06, 10:41:35] #> Creating directory .ragatouille/colbert/indexes/my_index 


#> Starting...
#> Starting...
nranks = 2 	 num_gpus = 2 	 device=1
[Jan 06, 10:41:38] [1] 		 #> Encoding 0 passages..
nranks = 2 	 num_gpus = 2 	 device=0
[Jan 06, 10:41:38] [0] 		 #> Encoding 2 passages..
 File "/home/np/miniconda3/envs/np-ml/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 101, in setup
    avg_doclen_est = self._sample_embeddings(sampled_pids)
  File "/home/np/miniconda3/envs/np-ml/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 140, in _sample_embeddings
    self.num_sample_embs = torch.tensor([local_sample_embs.size(0)]).cuda()
AttributeError: 'NoneType' object has no attribute 'size'

@bclavie
Copy link
Collaborator

bclavie commented Jan 6, 2024

Hey @timothepearce, thanks for flagging! I believe this is a very separate problem (the multiprocessing in your case runs fine, but there seems to be another problem). Could you create a new issue so I can look into it a bit more? And could you try out the notebooks in examples/ ? I think there might be something wrong with the README, which is (probably) that there aren't enough documents in the example (which I could fix by adopting a separate logic for n_docs that are far too small).

@bclavie
Copy link
Collaborator

bclavie commented Jan 6, 2024

@timothepearce
Copy link

@bclavie You're right, the code doesn't work either in the Python CLI, and seems related to the ColBERT library.

I'll open a new issue and dig a little bit more.

@bclavie
Copy link
Collaborator

bclavie commented Jan 6, 2024

Hey @vanetreg, for your other issue, the partial init -- no idea what's going on there, it seems like something weird happened when initialising ntlk?

I've tested some things on my end and I can confirm this is due to how ColBERT does multiprocessing, which causes the issue in some environments (seemingly Colab and Windows 10). This will eventually be fixed once the multiprocessing handling is changed upstream but sadly there doesn't seem to be a good in-notebook workaround on those two platforms at the moment.

If you use RAGatouille in a python script (making sure to have it inside if __name__ == "__main__":), it should hopefully run fine (though again, not tested on Windows)!

@vanetreg
Copy link
Author

vanetreg commented Jan 6, 2024

Hey @bclavie ,
I'm gonna test it on Replit during weekend and be back with the result.

@vanetreg
Copy link
Author

vanetreg commented Jan 8, 2024

Hey @bclavie , I'm gonna test it on Replit during weekend and be back with the result.

Hey @bclavie
after 2 days I found out Replit doesn't handle Jupyter notebooks properly, so I'm not able to test RAGatouille there, so since Google Colab also isn't an option, I should wait for the Windows fix :)

@jponline77
Copy link

I was using Windows 11, Cursor, Python 10 through WSL... Worked for me. So, may be a windows not in WSL thing. I gotta say it would be hard for me to imagine not working in WSL on a Windows machine myself at this point.

@jponline77 I tested it both in VSC and Cursor, in both WSL extension installed. Maybe Windows version (10 / 11 ) matters?

Yeah, maybe it's a Windows 10 issue. Just be sure, if you are using WSL, that it's actually running in WSL. If you are setup to run in WSL, then you should be able to try to run it command line from WSL directly without using VSC or Cursor. My experience with WSL is that it runs everything that runs in Ubuntu in a very similar way as if it was a standalone Linux system. So, it would surprise me a little if it matters if you are Windows 10 or 11. That said, any reason you aren't interested in upgrading to 11? I've now got RAGatouille running on two different systems with Windows 11 and WSL. One was a Laptop with a low end integrated GPU and 16GB of memory. It did take 10 minutes to index a small file but it worked.

@vanetreg
Copy link
Author

vanetreg commented Jan 8, 2024

@jponline77
I won't upgrade this PC, neither HW or Windows 10, so if I don't find a proper online IDE / runtime ( payed for Replit Core, considering Google Colab Pro ) with optional GPUs, I'll go for an M2 Mac mini :)

@bclavie
Copy link
Collaborator

bclavie commented Jan 8, 2024

Hey, thanks for this @jponline77 -- indexing is slow sadly, taking a while to create the index is the tradeoff to querying very large corpuses at near-constant time. It can maybe be optimised though (that'd require work on the upstream ColBERT repo), but that's something for the future!
I'm working on a feature to do index-free search, it's not very scaleable, at least at the moment (you could query maybe up to 1k documents in >1s on a T4 GPU, and obviously much slower every time you add something) but for smaller corpuses it will make it easy to try it out!


@vanetreg I think (not sure) you could try it out in a standalone script like I mentioned earlier? Wrap it in if __name__ == "__main__":... It's not ideal for interactivity but it could work! (At least it does on every non-windows platform I've tried). Anyhow, the Mac Mini is an excellent choice 😄

@jponline77
Copy link

Hey, thanks for this @jponline77 -- indexing is slow sadly, taking a while to create the index is the tradeoff to querying very large corpuses at near-constant time.

I was actually a little surprised it worked at all on the laptop. Indexing speed was much faster on my RTX4080 system with 128GB of ram :)

@bclavie bclavie changed the title Indexing can not be completed Indexing can not be completed (on Windows) Jan 9, 2024
@fblissjr
Copy link

fblissjr commented Jan 9, 2024

I'm getting it hanging on wsl2 ubuntu (win 11) as well. In a notebook and as a standalone python script (as well as wrapped in main). been using cuda + pytorch in wsl2 for a long time, first time i've seen this nccl issue pop up, and trying to trace around to where it might be coming from.

Pretty sure it's something to do with nccl, and likely colbert (edit: although the colbert notebook posted by @okhat above works fine).

my best guess so far is https://github.com/stanford-futuredata/ColBERT/blob/03fb1becb30c1d01e83d210ba0c4a25108543809/colbert/utils/distributed.py#L27

edit:
this error after running the RAG.index in example1, as well as any RAG.index function.

torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.18.1
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error.
Last error:
socketStartConnect: Connect to 11.16.94.50<60757> failed : Software caused connection abort

@bclavie
Copy link
Collaborator

bclavie commented Jan 14, 2024

Multiprocessing is no longer enforced for indexing when using no GPU or a single GPU thanks to @Anmol6's excellent upstream work on stanford-futuredata/ColBERT#290 & propagated by #51.

This is likely to fix the indexing problems on Windows (or at least, one of the problems). Please let me know if the latest version of RAGatouille fixes it for you!

@vanetreg
Copy link
Author

@bclavie
I updated to latest
0.0.4b2
(having Win 10, python 3.11.6)
and after loading dotenv ( setting HF_HUB_DISABLE_SYMLINKS_WARNING=true )
I still have errors while running:

from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

having error messages:

CalledProcessError                        Traceback (most recent call last)
Cell In[3], [line 3](vscode-notebook-cell:?execution_count=3&line=3)
      [1](vscode-notebook-cell:?execution_count=3&line=1) from ragatouille import RAGPretrainedModel
----> [3](vscode-notebook-cell:?execution_count=3&line=3) RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
...
File [d:\Projects\AI_testing\RAGatouille\ragatouille-venv\Lib\site-packages\torch\utils\cpp_extension.py:2382](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2382), in _write_ninja_file(path, cflags, post_cflags, cuda_cflags, cuda_post_cflags, cuda_dlink_post_cflags, sources, objects, ldflags, library_target, with_cuda)
   [2380](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2380) link_rule = ['rule link']
   [2381](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2381) if IS_WINDOWS:
-> [2382](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2382)     cl_paths = subprocess.check_output(['where',
   [2383](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2383)                                         'cl']).decode(*SUBPROCESS_DECODE_ARGS).split('\r\n')
   [2384](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2384)     if len(cl_paths) >= 1:
   [2385](file:///D:/Projects/AI_testing/RAGatouille/ragatouille-venv/Lib/site-packages/torch/utils/cpp_extension.py:2385)         cl_path = os.path.dirname(cl_paths[0]).replace(':', '$:')

File [~\AppData\Local\Programs\Python\Python311\Lib\subprocess.py:466](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:466), in check_output(timeout, *popenargs, **kwargs)
    [463](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:463)         empty = b''
    [464](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:464)     kwargs['input'] = empty
--> [466](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:466) return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    [467](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:467)            **kwargs).stdout

File [~\AppData\Local\Programs\Python\Python311\Lib\subprocess.py:571](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:571), in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    [569](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:569)     retcode = process.poll()
    [570](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:570)     if check and retcode:
--> [571](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:571)         raise CalledProcessError(retcode, process.args,
    [572](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:572)                                  output=stdout, stderr=stderr)
    [573](https://file+.vscode-resource.vscode-cdn.net/d%3A/Projects/AI_testing/RAGatouille/~/AppData/Local/Programs/Python/Python311/Lib/subprocess.py:573) return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

@bclavie
Copy link
Collaborator

bclavie commented Jan 16, 2024

I think this is an issue with Windows 10 and loading cpp extensions in PyTorch? Saw a few similar issues on other projects floating around... I think the current stance will be that the lib doesn't support Win10 unless someone can figure out a solid fix to this 😞

@nmstoker
Copy link

nmstoker commented Feb 9, 2024

In case others are trying to get it working on Windows 10, I did get past the cl error with non-zero exit status above (by installing the C++ parts of VS 2022 Build Tools) but I then ran into issues with pthread.h not being found. I tried vcpkg to install it (which was possible) but I still couldn't get it to work with the compiler and when I saw that cpp_extensions now seems archived, that, along with the time/effort taken to get to that point made me give up on Windows directly (for now at least!)

However I didn't have any problems with ragatouille using WSL on Windows (Ubuntu 20.04) via pip install ragatouille within a conda env with Python 3.11.7.

@grahama1970
Copy link

grahama1970 commented Feb 13, 2024

I'm having similar issues. I'm using WSL2 Windows 10 with faiss-gpu installed and faiss-cpu uninstalled. The basic script below has been running for 30 minutes...
I have 256GB RAM and 24GB of GPU RAM.

[Feb 13, 18:25:18] [0]           #> Encoding 81 passages..
[Feb 13, 18:25:20] [0]           avg_doclen_est = 129.82716369628906        len(local_sample) = 81
[Feb 13, 18:25:20] [0]           Creating 1,024 partitions.
[Feb 13, 18:25:20] [0]           *Estimated* 10,516 embeddings.
[Feb 13, 18:25:20] [0]           #> Saving the indexing plan to .ragatouille/colbert/indexes/Miyazaki/plan.json ..

After about 30 minutes, I got the error:

WARNING clustering 9991 points to 1024 centroids: please provide at least 39936 training points
Clustering 9991 points in 128D to 1024 clusters, redo 1 times, 20 iterations
  Preprocessing in 0.00 s
Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 128) x (1024, 128)' = (512, 1024) gemm params m 1024 n 512 k 128 trA T trB N lda 128 ldb 128 ldc 1024
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

Alternatively, the T4 GPU Colab ran very quickly around 5 minutes.
Any ideas?

@bclavie
Copy link
Collaborator

bclavie commented Feb 15, 2024

(Copy/pasting this message in a few related issues)

Hey guys!

Thanks a lot for bearing with me as I juggle everything and trying to diagnose this. It’s complicated to fix with relatively little time to dedicate to it, as it seems like the dependencies causing issues aren’t the same for everyone, with no clear platform pattern as of yet. Overall, the issues center around the usual suspects of faiss and CUDA.

While because of this I can’t fix the issue with PLAID optimised indices just yet, I’m also noticing that most of the bug reports here are about relatively small collections (100s-to-low-1000s). To lower the barrier to entry as much as possible, #137 is introducing a second index format, which doesn’t actually build an index, but performs an exact search over all documents (as a stepping stone towards #110, which would use an HNSW index to be an in-between compromise between PLAID optimisation and exact search).
This approach doesn’t scale, but offers the best possible search accuracy & is still performed in a few hundred milliseconds at most for small collections. Ideally, it’ll also open up the way to shipping lower-dependency versions (#136)

The PR above (#137) is still a work in progress, as it needs CRUD support, tests, documentation, better precision routing (fp32/bfloat16) etc… (and potentially searching only subset of document ids).
However, it’s working in a rough state for me locally. If you’d like to give it a try (with the caveat that it might very well break!), please feel free to install the library directly from the feat/full_vectors_indexing branch and adding the following argument to your index() call:

index(…
index_type=FULL_VECTORS”,
)

Any feedback is appreciated, as always, and thanks again!

@bclavie
Copy link
Collaborator

bclavie commented Mar 18, 2024

The CUBLAS errors turned out to be faiss incompatible driver issues for most people. This should be fixed by the new experimental default indexing in 0.0.8, which skips using faiss (does K-means in pure pytorch) as long as you're indexing fewer than ~100k documents!

@bclavie bclavie closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants