Skip to content

Builds a database of Magic the Gathering card images with card name, set, and perceptual hash of artwork.

License

Notifications You must be signed in to change notification settings

Machine-Learning-Labs/GathererImageGatherer

 
 

Repository files navigation

GathererImageGatherer

This project downloads all the card images from gatherer.wizards.com and saves them in the folder cardImages/ with their name and set.

The images can be used to build a database of perceptual hashes. Since each card artwork has a unique perceptual hash, they can be compared with perceptual hashes of a card in a picture to identify them. If a card is identified, the information can be input into http://shop.tcgplayer.com/magic for the user to quickly get the price.

Dependencies

To run these programs you will need the python libraries BeautifulSoup, requests, imagehash, PIL, and psycopg2.

    $> pip install -r requirements.txt
    or
    $> conda env create -f environment.yml
    git clone https://github.com/eulerto/pg_similarity.git
    cd pg_similarity/
    USE_PGXS=1 make
    USE_PGXS=1 make install

In postgres:

    CREATE EXTENSION pg_similarity;

Use

Download Images

    python scrapeImages.py

This downloads all the card images from http://gatherer.wizards.com/Pages/Default.aspx and saves them in the folder cardImages/ with their name and set.

The folder of pictures ends up being 1.21 GB and it takes about 25 minutes to download.

Setup The Database

Once postgres is installed, create a database and table needed for the python script.

    psql
    create database cardimages;
    \c cardimages
    create table phash(name text, set text, hash text);

Build The Database

    $> python buildDatabase.py

Populates a postgresql database with card name, set, and a perceptual hash of the artwork from the images downloaded with scrapeImages.py

Test A Card

    $> python queryDatabase.py

TODOs

  • Reorganize folders
  • Add Docker-Compose to develop without installing Postgres locally
  • Add a license (i.e. MIT)
  • Refactor the output paths to download
  • Add a way to resume downloads and avoid repetitions
  • Add a way to download from another sources (like ebay or google)
  • Refactor the way to download files creating a subfolder by card
  • Merge several ways to build the dataset
  • Test all python scripts to check function after route refactor
  • Finish Makefile
  • Add a way to automatically launch sql script once
  • Update README with makefile and new sections
  • Add a notebook example

About

Builds a database of Magic the Gathering card images with card name, set, and perceptual hash of artwork.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 77.3%
  • Shell 21.5%
  • Makefile 1.2%