UCLA 17S CS 249 Final Project

Dataset

Data found from Stack Exchange Data Dump

Overall Framework

Feature Extraction
Learning-to-Rank

Implemented Features

User Features
- user_age [1 numerical features]
- user_badge [categorical features]
- user_reputation [1 numerical features]
- user_views [1 numerical features]
- user_votes [1 numerical features]
User-User Features
- user-user interactions [1 numerical features]
Post Features
- comment_cnt [1 numerical feature]

Instructions

Downloading Data

Download the data from Stack Exchange Data Dump
Unzip folders to /raw folder

Preprocessing

cd src/preprocess
./preprocess.py [name of dataset]

Convert the format from XML to JSON
Convert HTML-like contents into plaintext
Link each question to the corresponding answers
- See data/[name of dataset]/question_answer_mapping.json after preprocessing
Split the whole set into training and testing sets
- See data/[name of dataset]/train.* and data/[name of dataset]/test.*
- Questions without the best answer (ground truth) and with less than two answers are removed

Example of Feature Extraction

Extract user_age features

cd src/feature_extraction
./user_age.py [name of dataset]

Directory Descriptions

Here some descriptions briefly show the purpose of each directory.

raw/: The directory for the raw data (i.e., Posts.xml, Users.xml, etc)
- raw/[name of dataset]/: the corresponding raw data for a certain dataset (e.g., StackOverflow)
- Note that the file names should not be modified.
data/: The directory for the preprocessed data
- data/[name of dataset]/: the corresponding preprocessed data for a certain dataset (e.g., StackOverflow)
src/: The directory for all source codes
- src/preprocess/: Codes for preprocessing raw data
model/: The directory for some trained model on large English corpus

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
features		features
node2vec/src		node2vec/src
raw		raw
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCLA 17S CS 249 Final Project

Dataset

Overall Framework

Implemented Features

Instructions

Downloading Data

Preprocessing

Example of Feature Extraction

Directory Descriptions

About

Releases

Packages

Languages

tan-patrick/StackOverflow-AnswerPrediction

Folders and files

Latest commit

History

Repository files navigation

UCLA 17S CS 249 Final Project

Dataset

Overall Framework

Implemented Features

Instructions

Downloading Data

Preprocessing

Example of Feature Extraction

Directory Descriptions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages