Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove programming languages? #265

Open
TomazErjavec opened this issue Apr 25, 2024 · 12 comments
Open

Remove programming languages? #265

TomazErjavec opened this issue Apr 25, 2024 · 12 comments
Assignees
Labels
SIS:domains related to functional domains as represented in the SIS SIS:formats format description and placement in the SIS
Milestone

Comments

@TomazErjavec
Copy link
Collaborator

Much as I love Perl, if find it strange that it is included in the formats:

  • the list gives only Perl and Lisp, of all possible (and many much more popular (today)) prog. languages
  • the list is about text(related) encoding standards, not programming languages, and it should maybe stay that way

Just my 2 late night cents.

@TomazErjavec TomazErjavec added invalid SIS:formats format description and placement in the SIS templatic this label is slapped automatically on issues coming from templates labels Apr 25, 2024
@bansp
Copy link
Member

bansp commented May 2, 2024

Thank you, Tomaž. It does seem to be outside the pattern. Gonna handle that when I'm back from vacation.

Maybe code should be handled by 'plain text in a (new) specific domain' rather than merely plain text in tool support (which is where, I guess, both these non-formats should be placed within the current system).

The domain system should be modified anyway to handle experimental-linguistic formats. Maybe a good topic for the upcoming SIC meeting.

@bansp bansp added SIS:domains related to functional domains as represented in the SIS and removed templatic this label is slapped automatically on issues coming from templates labels May 2, 2024
@bansp bansp added this to the SIS v. 2.7.0 milestone May 2, 2024
@bansp bansp removed the invalid label May 7, 2024
@bansp
Copy link
Member

bansp commented May 17, 2024

We have found out that the programming language reference is only virtual: Lisp and Perl are referenced by the unmaintained recommendations by EKUT and CLARIN.SI. "Unmaintained" means that they must have been transferred by hand into the SIS from the "Spreadsheet Era" ;-)

The current state is indeed erroneous also from the point of view of the domain used:
image
-- these entries are artefacts from hand-conversion that took place in the early days of the new SIS.

I am going to fix all three (remove the first two, modify the VRT).

@TomazErjavec
Copy link
Collaborator Author

Lisp and Perl are referenced by the unmaintained recommendations by EKUT and CLARIN.SI.

This is/was a bit of vicious circle, at least for CLARIN.SI: because I saw that the two prog. languages were an option, and we have nothing against having programs in these two languages in the repo, I ticked them. In other words, don't feel (as it seems you won't) from deleting them, just because CLARIN.SI allows them.

@bansp
Copy link
Member

bansp commented May 20, 2024

Hi Tomaž,
Ah, it took a bit of a sentimental journey to the Spreadsheet Era for me to understand what you meant by "ticking the options" -- indeed, back then, Perl, Lisp and R were present in the list of formats.

I'd rather not at this point trace how they became part of the recommendations and why they were placed in a rather inappropriate domain -- that was probably due to some quick decision making when encoding the content of the CSC spreadsheet in the SIS. You probably recall that we (Eliza, Hanna, and myself) did a very very quick job of up-converting the content of the spreadsheet, in the hope that the result is going to be soon afterwards improved by each centre in turn. (Naive youngsters... :-))

In my commit registered above, I missed the part where I should have deleted the two references. Now I've done that, in line with your initial comment, whose sentiment I share.

Still, that does leave us with a task of how to recommend to centres what to do when they want to say that they are OK with Perl and Python and even Lisp, but not OK with Microsoft Basic or something such. One place is the general info, another, that feels like a hack, is something like "plain text" in the domain of "Tool Support" -- totally unintuitive, I'm afraid. We could also have a fake format file that is called "Programming Language" and has all the characteristics of (Unicode) plain text, and, again, uses the comment section for the name. Feels relatively bad as well.

@TomazErjavec
Copy link
Collaborator Author

Still, that does leave us with a task of how to recommend to centres what to do when they want to say that they are OK with Perl and Python and even Lisp, but not OK with Microsoft Basic or something such.

Presonaly, I'd just ignore programming languages, and concentrate exclusivelly on language resources in SIS. The two are really different, and if a centre wants to say Lisp yes, Basic no (or, more likely, source yes, compiled code no) then they should say it somewhere on their pages, and we could leave it up to them.

Also, the current "Tool Support" has, in my mind, nothing to do with toos. DTD, TAR etc does not really seem like tool support to me.

@bansp bansp modified the milestones: SIS v. 2.7.0, SIS v. 2.8.0 May 24, 2024
@bansp
Copy link
Member

bansp commented Jun 5, 2024

Good point about ZIP, TAR, and friends. I'm not at all sure that there is a domain where they fit, other than... "Other".

@bansp
Copy link
Member

bansp commented Jun 12, 2024

I've only recently seen a centre page that specifically stated that they want tools as well (gosh, I've seen too many in a short time, can't recall which centre it was), and by "tools" they did mean source code. So maybe a generic fSrcCode (text/plain) or something like that...?

@bansp
Copy link
Member

bansp commented Jun 12, 2024

Ah, it's stated in the DSpace-derived FAQ, e.g. here: https://clarin-pl.eu/dspace/page/faq#what-submissions-do-you-accept
I probably meant that.

but also trained language models, parsers, taggers, MT systems, linguistic web services

No specific mention of source code, but it seems it's inferrable for some of the above.

@TomazErjavec
Copy link
Collaborator Author

So maybe a generic fSrcCode (text/plain)

Maybe, although I still think that removing the whole "programming languages" dimension from the SIS would be better.

@bansp
Copy link
Member

bansp commented Jun 17, 2024

I've found the quote I probably had in mind, above. It comes from BAS:

The BAS also under certain circumstances accepts software as a linguistic resource, if the
software's aim is the analysis, processing and/or administration of scientific phonetic data.
(source)

... so there's "popular demand" of a sort, it seems.

Having a single description, like fSrcCode, would maybe be an appropriate level of compromise: not encouraging a proliferation of descriptions of particular languages, and at the same time highlighting the source aspect (as opposed to compiled). The <comment> field would then be a place for stating that, e.g. Perl is discouraged and Python loved, etc.

@TomazErjavec
Copy link
Collaborator Author

Having a single description, like fSrcCode, would maybe be an appropriate level of compromise

Sure, if you don't want to ignore them completely, this would be the way to go I think.

@bansp
Copy link
Member

bansp commented Nov 21, 2024

We've already seen some new languages creeping in, and I have someone to talk to about that now, so let me use the oldest trick up a developer's sleeve: move the issue to the next milestone... ;-)

@bansp bansp modified the milestones: SIS v. 2.8.0, SIS v. 2.9.0 Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SIS:domains related to functional domains as represented in the SIS SIS:formats format description and placement in the SIS
Projects
None yet
Development

No branches or pull requests

2 participants