Download the data used here: https://www.kaggle.com/stackoverflow/pythonquestions/downloads/pythonquestions.zip/1
The pipeline to classify data is:
trim_data.py
word_weights.py
sentence_embeddings.py
train_model.py
To see which tags are most frequent run get_tag_counts.py
To have a baseline to compare your classifiers to run dumb_algorithm.py to see how it performs.