Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable automatic string datatype - possible? #2973

Open
hoijui opened this issue Jan 23, 2025 · 6 comments
Open

Disable automatic string datatype - possible? #2973

hoijui opened this issue Jan 23, 2025 · 6 comments
Labels

Comments

@hoijui
Copy link
Contributor

hoijui commented Jan 23, 2025

Version

5.4.0-SNAPSHOT

Question

I am maintaining an open source RDF/OWL linter based on Jena.
One of the things it checks, is whether all literals have types explicitly set.
Until now it was based on Jena 2, but now I am updating it to Jena 5, which automatically adds xsd:string as the datatype, if none is explicitly given (where Jena 2 stored null). I am using org.apache.jena.rdf.model.test.ModelTestBase.statement( final Model m, final String fact ) to create the model like so:

Statement source = statement(m, "a P 'untypedString'");
// ... or ...
Statement source = statement(m, "a P 'untypedString'xsd:string");

The above now both result in the same model.

Any ideas how I could detect the difference in these two, without re-implementing all of the parsing?

@hoijui
Copy link
Contributor Author

hoijui commented Jan 23, 2025

Could it be an option to change the code to be able to set an alternative default data-type?
(Which I could then set to some self invented NONE type)

@afs
Copy link
Member

afs commented Jan 23, 2025

In RDF 1.1, all literals have a datatype.

"abc" is syntax for "abc"^^xsd:string.
"abc"@de has datatype rdf:langString.

The old RDF 1.0 "plain literal" concept no longer exists.

The above now both result in the same model.

In RDF 1.1, they are the same model.

As of Jena 3, Jena supports RDF 1.1. Jena no longer has an RDF 1.0 mode. This runs quite deep - code assumes there is a datatype.

Good to see Eyeball continues!

@hoijui
Copy link
Contributor Author

hoijui commented Jan 24, 2025

Thank you @asf! :-)
That was a good history lesson; didn't know.
Hmm... I currently see no way to keep this test in Eyeball then, short of staying with Jena 2, or using Jena 5 usually, yet do all the parsing and analyzing also with Jena 2, just for this check, which would be super ugly (not to mention dependency and class-path issues).

I guess the fundamental issue here is once more, what I saw quite a few times already now, that linters really should not use production oriented parsers and libraries, as it makes detecting a hole range of issues impossible.
I do definitely not have the resources to change that for Eyeball, though.

@afs
Copy link
Member

afs commented Jan 24, 2025

The last Jena2 was 2015-03-13.

There is little to no RDF 1.0 data anymore. The transition went very smoothly. For Jena maybe one or two questions, and the answers seem to satisfy. No "this total breaks my system". It must have been because people either use "abc" or uses "abc"^^xsd:string but didn't mix them.

The issues to detect will fall into "ill advised" ways of writing the data where a custom parser will help, and higher level anti-patterns, right through to incorrect use of ontologies. Th e trouble with RDF is there are multiple syntaxes.

For "Eyebball-NG", who are the users? What data do they check?

For checking IRIs, the new jena-iri3986 system will be much easier to use and tune wranings/errors. Its not currently flexible to add new checks though (doable, hasn't been a need) or to add custom URI schemes (again, doable).

@afs
Copy link
Member

afs commented Jan 24, 2025

FYI: The Data Shapes Working Group (SHACL 1.2 ) is just getting started. SHACL does data validation above the level of parsing.

https://www.w3.org/TR/shacl/

If there are tasks that you think are useful that aren't covered, please do raise an issue.

@hoijui
Copy link
Contributor Author

hoijui commented Jan 24, 2025

Hmm....
So basically.. while in RDF 1.0, this check of eyeball (about missing, explicit datatype) was feasible and valid on the data level, it is now (RDF 1.1) only feasible/possible to check on the RDF serialization syntax level.
Eyeball is not designed/setup to operate on this level.
In this way I would say.. it is valid to just deprecate/remove this check,. and leave a note behind, documenting all that I learned now from you. Maybe there are other linters that operate on this level.

I know about SHACL, and used it a tiny bit, but I am nowhere deep enough into that stuff to be able to raise issues yet. I wrote a small tool to convert OWL ontologies to SHACL shapes. I know that is not logically valid and people like you that know al the ins and outs of the two, will pull out their hair, but for people like me, who use a sub-set of OWL to write kind of data schemas for distributed databases, the two encode mostly the same data, just using different terms. I only use a hand full of properties (range, domain, sub-class, sub-property, type, cardinalities).
It seems to basically work how I want it to, with the very limited testing I did so far.

Eyeball-NG ...
We got funding, and some of it is dedicated to make a basic linter for RDF& OWL.
When organizing the funding, I was not aware that linters already exist (even though I searched in the past). Now I am aware, and decided it makes more sense to raise (one of them) from the grave.
Really, we are more interested in OOPS, but the issue there is, that it is not open source (which is a requirement of the funding we got, and also one we have personally). The optimal scenario for us, would be to be able to raise eyeball from its (quasi) grave, and convince the OOPs authors to make their tool Open Source, and then provide some cleanup/touch-up/general coding help after that.
We are currently (today actually) an argument for that. If you have something to say regarding that that we might forward to them.. please do!
(Note: we had almost no personal contact with them so far, but I am a big fan of their work (OOPS, LOV, and some other things))

I am mostly a software dev, and not so much an academic. for me, RDF OWL are IT tools, not so much what Biologists, philosophy people, pharma/medical people and what not, see in it. I don't write papers and such ... you get the idea. I would love for them and me to be able to work together on this stuff, as I think we would be much better off this way, as I am weak on the theory, specs and deep intrinsics, but could improve their code quality (they are acedemics, foremost) some, I think.

I plan to actively use eyeball myself, regularly (CI), for all of our ontologies and at least one of a friend.
In the best case, it could also be integrated into LOV, or even into OOPS (as they are both written in Java), and then it could benefit all ontologies on/users of LOV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants