Unsupervised Product Matching Using Combinations and Permutations

A product matching algorithm that groups products into clusters of similarity based on their title using combinations and permutations.

Based on the paper "Unsupervised Product Matching Using Combinations and Permutations" by Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis and Christos Makris https://arxiv.org/pdf/1903.04276.pdf

Structure of Project

bin - Contains the executables
build - Contains the object files (.o)
include - Contains the header files
src - Contains the C++ code
build.bat - Builds the project and creates the main.exe file in bin (for Windows)
build.sh - Builds the project and creates the main.exe file in bin (for Ubuntu)
* Change the directory to the build directory build.bat or build.sh for your system

How to Use

Simply create a Product object that takes the vendor (the retailer to which the product is associated). The vendor parameter is a string. The next parameter is the title of the product, which is also a string. See main.cpp to see the test code.

How it Works

For each product, all of the combinations of the tokens (words) of the title are created up to n choose 6. Each token has its own properties, like its frequency and semantic value (item model, attribute or normal token) and based on this, they get different weights. The combinations also have their own properties, such as their Euclidean distance average and the frequency of the combination among the products. From this, using the equations outlined in the paper, each product is assigned a "dominating cluster", that is the combination that best represents what the product is. The products with the same cluster represent the same product.

Uses

This algorithm aims to solve the issue of how to deal with discrepincies among products across different retailers. Using this algorithm, it should be able identify the correct matching product among a wide range of products.

Reality

In contrast to the aim of this algorithm, it doesn't actually do its job. For example, running the algorithm on the set of data provided by the authors, it is very inaccurate, and it is not my implementation. I acheived similar, if not identical results using the author's algorithm, which can be found at https://github.com/lakritidis/UPM-full

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Product Matching Using Combinations and Permutations

Structure of Project

How to Use

How it Works

Uses

Reality

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
bin		bin
build		build
include		include
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.bat		build.bat
build.sh		build.sh

License

Mascerade/upm

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Product Matching Using Combinations and Permutations

Structure of Project

How to Use

How it Works

Uses

Reality

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages