Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix engine name inconsistencies (e.g., Glaurung_2.2_CCRL 64-bit_4CPU) #45

Open
skiminki opened this issue Jan 8, 2025 · 5 comments
Open
Assignees
Labels
bug Something isn't working
Milestone

Comments

@skiminki
Copy link
Collaborator

skiminki commented Jan 8, 2025

The engine names in TCEC games PGNs generally use format:

<engine-name> ' ' <version-string>

where <version-string> does not contain any spaces. That is, the final space in the player name separates the engine name and its version.

In certain events, the engine names in the source PGNs do not follow the usual formatting. For example, some games in TCEC_Season_26_-_Old_Vs_4k_Top_Bonus.pgn have player names such as Glaurung_2.2_CCRL 64-bit_4CPU and Crafty_25.2_CCRL 64-bit 4CPU.

We should investigate:

  1. Whether these names should be unified in PGN postprocessing, or alternatively, fixed in the source (i.e., the web server archive); AND
  2. Whether other events also contain unusual engine naming.

Unusual naming may confuse programs that read the TCEC games PGNs. For example, the scripts behind the !h2h and !tdb TCEC chat commands currently report the engine names incorrectly.

@skiminki skiminki added the bug Something isn't working label Jan 8, 2025
@skiminki skiminki added this to the S27-interim-2 milestone Jan 8, 2025
@skiminki skiminki self-assigned this Jan 8, 2025
@robertnurnberg
Copy link

A quick analysis of TCEC-everything.pgn from https://hoover.leelanet.fi/ gives these names for the two engines mentioned in the issue:

> grep '\"Glaurung' TCEC-everything.pgn | sed 's/Black //g' | sed 's/White //' | sort -u
["Glaurung 2.2"]
["Glaurung_2.2_CCRL 64-bit_4CPU"]
["Glaurung 2.2 JA"]
> grep '\"Crafty' TCEC-everything.pgn | sed 's/Black //g' | sed 's/White //' | sort -u
["Crafty 23.3"]
["Crafty 23.4"]
["Crafty 23.5"]
["Crafty 23.6"]
["Crafty 23.8"]
["Crafty 24.1"]
["Crafty_25.2_CCRL 64-bit 4CPU"]

@Aloril is that something you can fix in the TCEC web archive?

@robertnurnberg
Copy link

Here is a list of potential problems (they would all contain a space in <engine-name> if the above mentioned format was followed (so no spaces allowed in <version-string>).

> grep '\(White\|Black\) "' TCEC-everything.pgn | sed -E 's/\[(White|Black) "([^"]*)"\]/\2/g' | sed -E 's/(.*) .*$/\1/' | sort -u | grep " "
Antifish 1.0 Mark
Cheese 3.0
Crafty_25.2_CCRL 64-bit
DeepSjeng 3.6
Ethereal TCEC S20
Ethereal TCEC S20 DivP
Fritz in
Glaurung 2.2
Gull_20170410_CCRL 64-bit
Houdini 1.5a Sufi
Houdini 3 Sufi
Houdini 6.03 Sufi
Igel 2.1.2
Igel 3.0.5
Komodo 8 Sufi
Komodo 9.2 Sufi
Komodo 9.3 Sufi
Laser 1.8
Laser 1.8 beta
LCZero 0.7
LCZero copy
LCZeroCPU3pct v0.25-n591215
LCZero half
LCZero v0.21.1-nT40.T8.610 Sufi
Marvin 3.4.0
Minic 3.07
Rodent III
RubiChess 2.2-dev
Rybka 4
Rybka 4.1 Sufi
Sjeng c't
SlowChess Blitz
SlowChess Blitz 2.41
SlowChess Blitz 2.5
SlowChess Blitz 2.54
SlowChess Blitz 2.7
SlowChess Blitz 2.75
SlowChess Blitz 2.8
SlowChess Blitz 2.82
SlowChess Blitz 2.83
SlowChess Blitz 2.9
SlowChess Blitz Classic
SlowChess Blitz Classic 2.26
Stockfish 180614 Sufi
Stockfish 18102108 Sufi
Stockfish 190203 Sufi
Stockfish 260318 Sufi
Stockfish 6 Sufi
Stockfish 8 Sufi
Stockfish copy
Stockfish dev-20240605-5688b188
Stoofvlees II
Sufi10 Houdini
Sufi11 Stockfish
Sufi1&2 Houdini
Sufi12 Stockfish
Sufi13 Stockfish
Sufi14 Stockfish
Sufi15 LCZero
Sufi3 Rybka
Sufi4 Houdini
Sufi5 Komodo
Sufi6 Stockfish
Sufi7 Komodo
Sufi8 Komodo
Sufi9 Stockfish
The Baron
Toga II
Wasp 4.10
Wasp TCEC
Weiss 0.10-dev-20200525
Xiphos 0.6
Zappa Mexico

I also attach the output of

> grep '\(White\|Black\) "' TCEC-everything.pgn | sed 's/\[\(White\|Black\) "\([^"]*\)"]/\2/g' | sort -u > unique_engine_versions.txt
> wc -l unique_engine_versions.txt
2092 unique_engine_versions.txt

unique_engine_versions.txt

skiminki added a commit to skiminki/tcecgames that referenced this issue Jan 12, 2025
The newly added script parses the player name tag to three parts:
- engine name
- engine version
- event-specific special tag (e.g., "Sufi 4")

The script is not yet added in the Makefile PGN processing pipeline, as
the the final output format has not yet been decided.

Fixes TCEC-Chess#45
@skiminki
Copy link
Collaborator Author

Added a script to parse the white/black PGN tags (see the above commit). The script parses the name tag as three parts:

  • engine name
  • engine version (optional)
  • event-specific tag (optional)

Example:

$ cat TCEC-everything-bonus-test.pgn TCEC-compet.pgn | scripts/fix-engine-names.py | egrep '[[](White|Black) ' | sed -r -e 's/^.{8}//' -e 's/..$//' | sort -u
4ku
4ku (1.0)
4ku (2.0)
4ku (3.0)
4ku (3.1)
4ku (4.0)
4ku (5.0)
4ku (5.1)
A0lite (v0.1.1_BadGyal9_LittleEnder12p)
A0lite (v0.1.2_BadGyalXL9d)
...
Komodo (8)
Komodo (8) [Sufi 5]
Komodo (9.1)
Komodo (9.2)
Komodo (9.2) [Sufi 7]
Komodo (9.3) [Sufi 8]
Komodo (9.3x)
Komodo (9.42)
...
Stockfish (2020102823_nn-2eb2e0707c2b)
Stockfish (202011101829_nn-c3ca321c51c9)
Stockfish (20201123_nn-c3ca321c51c9)
Stockfish (20201225)
Stockfish (20210113)
Stockfish (2021013116) [01]
Stockfish (2021013116) [02]
Stockfish (2021013116) [03]
Stockfish (2021013116) [04]
...

This is by far not yet complete. Remaining items:

  • Handling of all exceptions
  • Deciding the output final format

The current format is: <engine name> (<engine version>) [event tag]

Some candidates for the final format:

  • <engine name> (<engine version>) [event tag]
  • <engine name>/event_tag <engine_version>. For example:
    • "Stockfish 20201225"
    • "Stockfish/42 2021013116"
    • "Komodo/Sufi_8 9.38"
    • "The Baron 3.44.1"
    • "SlowChess Blitz_2.5_avx" (assuming "Blitz 2.5 avx" is the version)

@robertnurnberg
Copy link

Nice work.

I didn't look at the code yet, but I do like the current format you suggest.

Just to clarify: this new format would appear in the pgn's for this repo? As the entries for White and Black?

And going forward the <engine_name> part (possibly with spaces replaced by underscores) would become the default output/input for the !h2h and !tdb commands?

skiminki added a commit to skiminki/tcecgames that referenced this issue Jan 19, 2025
The newly added script parses the player name tag to three parts:
- engine name
- engine version
- event-specific special tag (e.g., "Sufi 4")

The script is not yet added in the Makefile PGN processing pipeline, as
the the final output format has not yet been decided.

Fixes TCEC-Chess#45
@skiminki skiminki modified the milestones: S27-interim-2, S27-final Jan 25, 2025
@skiminki
Copy link
Collaborator Author

Retargeting this issue for S27-final. I plan to make the S27-interim-2 release today before this issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants