Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Octave language with its own language distinct from MATLAB #6936

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

pr0m1th3as
Copy link

@pr0m1th3as pr0m1th3as commented Jul 11, 2024

Description

This PR is adding a distinction between MATLAB and GNU Octave languages, which both use .m files for their code files. Although GNU Octave is often described as the open source MATLAB alternative due to their similarities, the reality is that it is a separate programming language. Most importantly, although MATLAB syntax can be interpreted by GNU Octave, the opposite is not true. Octave syntax includes # for comment lines, and endif, endfor, endfunction, endclassdef etc as block's closing statements, which are explicitly recognized by the Octave interpreter, but cannot be parsed/run in MATLAB. As a result of these differences, GitHub falsely recognizes Octave code as MATLAB and by using MATLAB's syntax highlighting grammar results into mis-displaying code, which is written explicitly for the Octave language.

Besides the mis-highlighted code syntax of files containing Octave code, this is also confusing for users who might consider that code written for Octave may work in MATLAB, since GitHub reports all Octave code as MATLAB. GNU Octave is a community driven open source project, which has been in constant development for more than 30 years now, and with a reasonably large user base across academic and industrial sectors, especially in engineering fields. Thus, there is also an ethical aspect towards our community of developers and users that the Octave language should not be misclassified as another proprietary language by the worlds largest developer platform, GitHub, which gratefully provides free hosting for a myriad of open source projects.

Checklist:

pr0m1th3as and others added 12 commits June 30, 2024 22:26
Octave is a "largely Matlab-compatible" language, but it has several syntax elements distinct from Matlab, and which are in common use by Octave coders and encouraged as conventional Octave style. This grammar supports Octave with those variants, and without new Matlab syntax elements which are not present in Octave.
Add language: Octave, as distinct from Matlab
@pr0m1th3as pr0m1th3as requested a review from a team as a code owner July 11, 2024 17:06
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments.

Additionally, several of your samples are too big for our needs (the ones suppressed in the diff). Please replace them with smaller samples.

You also need to add the cached license file the add-grammar script should have created when you ran it.

If you can't find it or it didn't get create it, you can do it manually by running:

bundle exec licensed cache -c vendor/licenses/config.yml

lib/linguist/languages.yml Outdated Show resolved Hide resolved
test/test_heuristics.rb Outdated Show resolved Hide resolved
@lildude lildude changed the title Add Octave language with its own highlight syntax grammar distinct from MATLAB Add Octave language with its own language distinct from MATLAB Jul 12, 2024
@tpapastylianou
Copy link

Spotted this PR and just wanted to add that, while octave and matlab do indeed share an odd relationship, matlab is not the only language with which github language detection clashes with, particularly when ".m" files are involved, which are also used by objective-c, wolfram mathematica, and mercury-lang as well.

So I would say this is less a case of "making octave distinct from matlab" specifically per se, but more about adding octave as a proper language, and disambiguating octave files among all the above languages that also use the ".m" extension, including matlab (whose syntax is indeed different, despite the many similarities).

E.g. if one searches github for "language:Objective-C octave" (or indeed "language:Objective-C matlab") one will witness this confusion, and that it's not necessarily octave/matlab-exclusive.

@pr0m1th3as pr0m1th3as requested a review from lildude July 12, 2024 16:43
@pr0m1th3as
Copy link
Author

@lildude I have made all necessary changes to the best of my understanding of the errors produced from the automated tests. Please, run once again the testing worklfow and give me some guidance about the necessary changes that might still be required. Thank you very much for your help.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better. Just a few more suggestions.

lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
@pr0m1th3as
Copy link
Author

@lildude Are there any other changes required before this PR can be merged?

@lildude
Copy link
Member

lildude commented Aug 18, 2024

@lildude Are there any other changes required before this PR can be merged?

Yup. See the test failure.

Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the distinction between MATLAB and Octave that great that Octave projects don't contribute to the usage statistics of MATLAB, and vice versa? If not, then the new Octave entry should be grouped under MATLAB (using a group: MATLAB field).

Otherwise, this will make a mess of classification: I can already see projects being classified as a "mixture" of MATLAB and Octave files, purely because each file was analysed in isolation, as per Linguist's implementation.

Grouping won't stop you from selecting a more accurate grammar to highlight Octave source (and by extension, CodeMirror modes). In fact, this is a large part of why this language "grouping" feature exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants