index 18509 is out of bounds for axis 1 with size 13293 - error while creating lda model using gensim #160

lonewolf06 · 2020-01-17T19:04:11Z

Hello,

I am getting the mentioned error while trying to create an lda model for customers comments topic modeling, I am new to python so wasn't able to debug the issue. Any help is much appreciated! Below is the code:

#function to clean up the data
def clean_text(text):
tokenized_text = word_tokenize(text.lower())
cleaned_text = [t for t in tokenized_text if t not in stopwords_hotel and re.match('[a-zA-Z-][a-zA-Z-]{2,}', t)]
return cleaned_text

#data tokenization
tokenized_data_hotel = []
for text in df_hotel.customer_comments_lem:
tokenized_data_hotel.append(clean_text(text))

Build a Dictionary - association word to numeric id

dictionary_hotel = corpora.Dictionary(tokenized_data_hotel)

Transform the collection of texts to a numerical form

corpus = [dictionary.doc2bow(text) for text in tokenized_data_hotel]

#creating bag of words corpus
corpus_bow_hotel = [dictionary.doc2bow(doc) for doc in tokenized_data_hotel]

topic modeling using bag of words

lda_model_bow_hotel = gensim.models.ldamodel.LdaModel(corpus=corpus_bow_hotel,
id2word=dictionary_hotel,
num_topics=4, per_word_topics='TRUE')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index 18509 is out of bounds for axis 1 with size 13293 - error while creating lda model using gensim #160

index 18509 is out of bounds for axis 1 with size 13293 - error while creating lda model using gensim #160

lonewolf06 commented Jan 17, 2020

index 18509 is out of bounds for axis 1 with size 13293 - error while creating lda model using gensim #160

index 18509 is out of bounds for axis 1 with size 13293 - error while creating lda model using gensim #160

Comments

lonewolf06 commented Jan 17, 2020

Build a Dictionary - association word to numeric id

Transform the collection of texts to a numerical form

topic modeling using bag of words