Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion: NAACL 2024 #3388

Merged
merged 20 commits into from
Jun 15, 2024
Merged

Ingestion: NAACL 2024 #3388

merged 20 commits into from
Jun 15, 2024

Conversation

iconstantinescu
Copy link
Collaborator

@iconstantinescu iconstantinescu commented Jun 10, 2024

  1. In the Github sidebar, add workshop to work items and the current milestone
  2. In the Github sidebar, make sure to link to a corresponding PR under "Development"
  3. Make sure the branch is merged with the latest master branch
  4. Ensure that there are editors listed in the <meta> block
  5. If it's a workshop, add a <venue>ws</venue> tag
  6. Add events to their relevant SIGs
  7. Look at the venue listing for prior years, and ensure that the new volume titles are consistent. You can do this by clicking on the venue name from a paper page, which will take you to the vendor listing.
  8. Navigate to the event page preview (e.g., https://preview.aclanthology.org/icnlsp-ingestion/events/icnlsp-2021/), and page through, to see if there are any glaring mistakes
  9. Skim through the complete listing, looking for mis-parsed author names.
  10. Download the frontmatter and verify that the table of contents matches at least three randomly-selected papers
  11. Download 3–5 PDFs (including the first and last one) and make sure they are correct (title, authors, page numbers).

Copy link

github-actions bot commented Jun 10, 2024

Build successful. Some useful links:

This preview will be removed when the branch is merged.

@davidstap
Copy link
Collaborator

Would it be possible to already merge main conference (with findings), and merge the workshops later? That way, when people watch Underline presentation videos they can be pointed to the corresponding paper in the Anthology.

@mjpost
Copy link
Member

mjpost commented Jun 13, 2024

This is our plan, though ideally we'd include tutorials, industry track, and SRW in this ingestion. There are also a number of other outstanding items that I hope to have included on this PR:

  • Correction of capitalization rules for names
  • Upload the PDFs
  • Attachments seem to be missing
  • Getting ahold of the copyrights
  • Uploading the raw ingestion material to our import archive

I can already see that there are incorrectly-capitalized names, e.g., Jan-Willem van de Meent. We have an issue for this ( (see e.g., rycolab/aclpub2#171).

@mjpost
Copy link
Member

mjpost commented Jun 14, 2024

Ideally we'd have all of the other volumes, but I am having trouble even getting responses from the NAACL team, and don't want to spend time on this. The only real blocker is the name parsing: if we use the current aclpub2 code, it will introduce a ton of errors, which will then create a lot of work for us down the road handling corrections.

@mjpost
Copy link
Member

mjpost commented Jun 15, 2024

@iconstantinescu is this ready for merging?

@mjpost mjpost marked this pull request as ready for review June 15, 2024 17:45
@mjpost mjpost merged commit c74b407 into master Jun 15, 2024
2 checks passed
@mjpost mjpost deleted the naacl-24-ingestion branch June 15, 2024 18:17
@iconstantinescu iconstantinescu restored the naacl-24-ingestion branch June 15, 2024 21:32
@mjpost mjpost deleted the naacl-24-ingestion branch June 20, 2024 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants