ENSF-612 Term Project

A project on using PySpark/databricks to replicate and extend a given research paper based on an application of Machine Learning.

About

In this project, we replicated and extended a research paper on Categorizing the Content of GitHub README Files using PySpark/databricks. The original code by the authors of the research paper is available here.

Project Video

Folder Structure

data_dumps - data dumps from the SQLite database
manual_work - files used for manual work
new_input_readmes - new readme files used for the ML model
notebooks - contains the code from research paper adapted for databricks, along with additional code
other - contains Research paper, instructions, and report template
presentations - contains presentations and the report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ENSF-612 Term Project

About

Project Video

Folder Structure

Contributors

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
data_dumps		data_dumps
manual_work		manual_work
new_input_readmes		new_input_readmes
notebooks		notebooks
other		other
presentations		presentations
.gitignore		.gitignore
README.md		README.md

meng-ucalgary/ensf-612-project

Folders and files

Latest commit

History

Repository files navigation

ENSF-612 Term Project

About

Project Video

Folder Structure

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages