Skip to content

Latest commit

 

History

History
183 lines (136 loc) · 4.83 KB

README.md

File metadata and controls

183 lines (136 loc) · 4.83 KB

ISSN lister

This project aims to provide a fairly current list of valid ISSN. It was developed at the Internet Archive.

ISSN-LIST-DATE: 2024-08-11 -- download COUNT: 2380273

Publicly available metadata has been archived at:

International Standard Serial Number

International Standard Serial Number, is an eight-digit serial number used to uniquely identify a serial publication, such as a magazine.

Issuing organisation: issn.org.

The CIEPS, also known as the ISSN International Centre, is an intergovernmental organization which manages at the international level the identification and the description of serial publications and ongoing resources, print and online, in any subject.

Variants

  • E-ISSN (electronic), P-ISSN (print), ISSN-L (link)

Conversely, as defined in ISO 3297:2007, every serial in the ISSN system is also assigned a linking ISSN (ISSN-L), typically the same as the ISSN assigned to the serial in its first published medium, which links together all ISSNs assigned to the serial in every medium

Usage

$ issnlister -h
Usage of issnlister:
  -b int
        batch size per worker (default 100)
  -c string
        continue harvest into a given file (implies -m)
  -d string
        path to cache dir (default "/home/tir/.cache/issnlister")
  -i string
        path to file with ISSN to ignore, one ISSN per line, e.g. ...
  -l    list all cached issn, one per line
  -m    download public metadata in JSON format
  -q    suppress any extra output
  -s string
        the main sitemap (default "https://portal.issn.org/sitemap.xml")
  -ua string
        set user agent (default "issnlister/0.1.0 (https://github.com/miku/issnlister)")
  -version
        show version
  -w int
        number of workers (default 16)

Generate a new list

Update list and README with a simple make issn.tsv (assuming sed, awk and sort installed).

Start a harvest or continue a harvest

With -c you can start or continue an interrupted harvest into the same file.

$ issnlister -c file.ndj

Basic ISSN validation

def calculate_issn_checkdigit(s):
    """
    Given a string of length 7, return the ISSN check digit.
    """
    if len(s) != 7:
        raise ValueError('seven digits required')
    ss = sum([int(digit) * f for digit, f in zip(s, range(8, 1, -1))])
    _, mod = divmod(ss, 11)
    checkdigit = 0 if mod == 0 else 11 - mod
    if checkdigit == 10:
        checkdigit = 'X'
    return '{}'.format(checkdigit)

Number of ISSN

  • ~2714711 (as of 2019-11-11 per website), but

Growth at about 50k to 120k updates and additions per year.

$ curl -sL https://git.io/Jf8sa | wc -l
2139915

Upper limit of valid ISSN?

  • 10^7

Current probability that a random, valid ISSN is registered: ~0.213 (2020-05-12).

Distribution

Snapshot, 2019-11-11, 15:00, UTC+1.

Formats

Various formats are available.

List of ISSN

List ISSN, quietly.

$ issnlister -l -q

All data is cached (XDG), by default under $HOME/.cache/issnlister/2019-11-11/... where raw downloads and combined data lives.

Alternatively:

$ find ~/.cache/issnlister/2019-11-20 -name 'sitemap*xml' -exec 'cat' {} \; | \
    grep 'https://portal.issn.org/resource/ISSN/[^"]*' | \
    grep -oE '[0-9]{4}-[0-9]{3}[0-9xX]' | LC_ALL=C sort -u

Bulk ISSN check

There is a bulk ISSN check tool included in this repo, issncheck. It does not check whether a ISSN is "valid" (that's an algorithm), but whether it "actually exists" (which is a dataset; with some latency, since the binary embeds the data).

To build:

$ make issncheck

You need to feed it one ISSN per line to stdin - it will output a TSV with "0", "1" or "X" (unparsable) and the value.

$ head -10 sample.tsv
20140827
1932-6203
0000-0002
00032489
0030-1558
2009-1005
31735070
1476-2986
31735066
0000-0003

$ cat sample.tsv | ./issncheck
1       2014-0827
1       1932-6203
0       0000-0002
0       0003-2489
1       0030-1558
0       2009-1005
0       3173-5070
1       1476-2986
0       3173-5066
0       0000-0003

Data point: The issncheck tool can verify about 700K ISSN per second on a i7-8550U.