Assignment Task for Data Scientists

Introduction

You are a Data Scientist working at upday, a news aggregator and recommender.

The engineering team at upday is gathering on a regular basis articles from all the Web. In order to provide a proper filtering functionality in the app, the articles need to be categorized.

You have at your disposal a pre-labelled dataset that maps different articles and their metadata to a specific category.

It's up to you now to help the company providing a solution for automatically categorizing articles.

Assignment

The repository contains a dataset with some english articles and some information about them:

category
title
text
url

The purpose of the task is to provide a classification model for the articles.

Instructions

You should make a pull request to this repository containing the solution. If you need to clarify any point of your solution, please include an explanation in the PR description.

What we expect:

Explanation about the solution you adopted and the results from your data exploration
Documentation of the results of your model, including the metrics adopted and the final evaluation
The training and evaluation code

The solution should just perform better than random, also we expect you to use model that is not just rules-based.

How to present the documentation and the code is up to you, whether to provide one or more jupyter notebooks or via a different mean.

Bonus

Scripts to be run from the command line:

A script for training the dataset
A script for evaluating the dataset
A script to infer the category given an article

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dockerfile		Dockerfile
INSTRUCTIONS.md		INSTRUCTIONS.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
exploration.ipynb		exploration.ipynb
model_evaluate.py		model_evaluate.py
model_predict.py		model_predict.py
model_train.py		model_train.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment Task for Data Scientists

Introduction

Assignment

Instructions

Bonus

About

Releases

Packages

Contributors 2

Languages

thegreatwarlo/data-science-assignment-task

Folders and files

Latest commit

History

Repository files navigation

Assignment Task for Data Scientists

Introduction

Assignment

Instructions

Bonus

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages