Music recommender system

This project is based on a study course called Recommender Systems in Production.

The author of the original course repository is Nikolay Anokhin. With his permission I am publishing my fork of his private repository.

This solution ranked second among the students. At the same time I used only simple heuristics.

General info

Users come to the botify music service to listen to music. First, the user picks the starting track himself. When the user has listened to that track, the service recommends the next one. The user can listen to the recommended track or skip it and move on to the next one. Otherwise, the user may get bored and leave. How the user will act depends on the quality of recommendations: if they are bad, the user will quickly leave; if they are good, he will "glue". The diagram below shows one user session

User interaction with the Botify recommender

flowchart TD
    subgraph id2 [<b>Botify</b>]

    B1(<b>Botify</b> recommends the following track)
    style B1 fill:#ffd088,stroke:#a05500
    end
    subgraph id1 [Sim]

    S1((Session start)) --> S2(User selects the first track t<sub>0</sub>)
    S2 --> S3(user is listening the track t<sub>i</sub>)
    style S3 fill:#ffd088,stroke:#ff9a00
    S3 ---> S4{Boring?}
    S4 -->|Yes|S5((End of session))
    S4 -->|No|S6(User request follow track)
    S7(the user gets a new track t<sub>i + 1</sub>) --> S3

    style S1 fill:#ffbbbb,stroke:#ff0000
    style S2 fill:#ffd088,stroke:#a05500
    style S3 fill:#ffd088,stroke:#a05500
    style S4 fill:#ffd088,stroke:#a05500
    style S5 fill:#ffbbbb,stroke:#ff0000
    style S6 fill:#ffd088,stroke:#a05500
    style S7 fill:#ffd088,stroke:#a05500
    end

    style id1 fill:#bbeeff,stroke:#0000ff
    style id2 fill:#bbffbb,stroke:#00ee00
    S6 --> B1
    B1 --> S7

The purpose of botify service is to keep the user as long as possible. The service measures the percentage of listening to each track from the session and sums these percentages to get the total length of the session (we ignore the fact that tracks can last different time - usually songs last about 3 minutes). The quality of recommendations directly affects how many tracks users will listen to and what percentage of them will listen to. Therefore, the quality of recommendations affects the length of the session and the success of the service as a whole.

Modules description

botify

Recommendation service.

Description and instructions

sim

Since our recommender is educational, it is not deployed for real users. But we want to experiment with the recommender as with a real one. For this purpose, a user simulator is implemented in the sim module. Running the simulator, we generate traffic similar to the traffic generated by real users.

Description and instructions

Jupyter

Notebooks with data pre-processing for the simulator, visualization, AB experiments etc.

Prerequisites

Install docker
Create virtualenv with python (it is recommended to use version 3.7)
Not required, but you can use the report/requirements.txt to create an environment in the conda package manager (only if you prefer conda to pip).
To use this project, follow the instructions in the readme files of the modules.

General instructions

Fix paths for you OS (I use Win)
Run docker-container from botify module
Run manual or simulation mode with sim module
Save logs
Use "AB experiment" notebook to check performance if you're conducting new experiment.

My contribution

Translation how to READMEs from Russian and a pinch of mermaid.
Modifications to the Bodify module (I did not touch the sim module)

Main ideas:
- Use a different model when quality is reduced.
- Use filters for tracks and artists you've already listened to.
Details
- Use contextual model as a baseline
- Use top pop if quality is reduced
- Three experiments were conducted with different lengths of the top (1000, 100, 80). The best result was shown by the model using the data obtained from the Collaborative recommender and recommending only the first 100 tracks from the top.
- Two filters were written using the listening history.
- Implemented saving of tracks the user listened to redis.
- The AB-experiments notebook has been edited for more easily obtaining results, and a cumulative sum over time plot has been added.
- The TopPop model analysis notebook has been edited for easy comparison of TopPop models using different data.
- Small refactoring was done to simplify the functionality of adding experiments.

All files in the Botify module have been changed or written from scratch.

Results

The best time gain relative to the baseline was 63.003%. This result was obtained using a filter on artists, where one artist could meet only once.

If you know Russian, you may check the report for a more detailed description of the result.

Further work and ideas

Another idea I would like to implement is to save a list of "suitable" tracks for the user based on the listening history and give out the most frequent tracks in this list.
Improving network architecture
It is also worth adding logging to count the number of times TopPop is used
Add other embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
botify		botify
jupyter		jupyter
report		report
sim		sim
.gitignore		.gitignore
README.md		README.md
user-flow.png		user-flow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music recommender system

General info

Modules description

botify

sim

Jupyter

Prerequisites

General instructions

My contribution

Results

Further work and ideas

About

Releases

Packages

Languages

sigord/music-recommender-system

Folders and files

Latest commit

History

Repository files navigation

Music recommender system

General info

Modules description

botify

sim

Jupyter

Prerequisites

General instructions

My contribution

Results

Further work and ideas

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages