Error in attempting to load RDFa #34

josephguillaume · 2024-08-24T13:43:03Z

This is a rather niche error, but I thought I'd document it anyway.
I saved an index.html file in my movies folder and Media Kraken failed to load, with the error below.

Stack trace:

Error: Found illegal @id 'null'
    at o.handle (https://noeldemartin.github.io/media-kraken/js/0.0c3c3aec.worker.js?__WB_REVISION__=cc5697750ce9b8580d94d3a24979b00e:23:36647)
    at _.newOnValueJob (https://noeldemartin.github.io/media-kraken/js/0.0c3c3aec.worker.js?__WB_REVISION__=cc5697750ce9b8580d94d3a24979b00e:1:9452)
    at async _.executeBufferedJobs (https://noeldemartin.github.io/media-kraken/js/0.0c3c3aec.worker.js?__WB_REVISION__=cc5697750ce9b8580d94d3a24979b00e:1:13363)

I tracked the error down to this minimal example

<!doctype html>                                             
<html>                                                       
<body>                                                             
<span 
  typeof="https://schema.org/Movie" 
  property="https://schema.org/name">
Movie
</span>                        
 </body>                                                    
</html>

It turns out I had defined a blank node of type schema:Movie, and Media Kraken was not able to cope with it.

I think it makes sense for Media Kraken to not support blank node movies.

However, I don't think it's the intended behaviour that Media Kraken tried to parse RDFa in index.html in the first place?
While it is correct that a html file in the movies folder could contain valid data, I don't think Media Kraken is set up to write to it.

I suspect that the intended behaviour here would be for Media Kraken to ignore invalid data, perhaps with a warning.

The text was updated successfully, but these errors were encountered:

NoelDeMartin · 2024-08-25T18:43:09Z

Hey, thanks for reporting this and getting to a minimal reproduction.

I am aware that blank nodes are not supported, that's a know limitation tracked here: NoelDeMartin/soukai-solid#19

However, it's very weird that it's parsing the html because I haven't done any of that :/. The fetch request includes an Accept: text/turtle header, so maybe it's the Pod parsing the html and returning turtle? Which Pod server are you using?

In any case, I agree that this type of error should probably be handled explicitly, so I'll leave this issue open at least until I handle that. I'll leave it as an enhancement, though.

josephguillaume · 2024-08-27T11:38:28Z

That's right. I'm using community solid server. I've now checked and can confirm that the minimal example is returned as:

_:df_10585_0 a <https://schema.org/Movie>.
<https://server/movies/test.html> <https://schema.org/name> _:df_10585_0.

This is possibly an argument for avoiding a data model that overloads ldp:contains - the assumption is made that all the resources referenced by ldp:contains should be parsed as movies, and this is not necessarily the case.
It can be enforced with a shape tree, but at the expense of imposing a closed world assumption.
An alternative would be to use an additional predicate to ldp:contains, e.g. schema:itemListElement, as you have done elsewhere.

NoelDeMartin · 2024-08-28T17:29:50Z

I see. The thing with ldp:contains is that that's how it works with the type index. I guess I could register an instance of a "movies list" or something instead, but the problem with that is that I have to make up a new class (as I mentioned for Umai). Also, most people using the type index is using it in this way (registering containers, not instances of lists). So changing that could harm interoperability.

All in all, seeing how things stand right now, I think the best solution is to handle these malformed document errors. What I'm unsure about is whether to bother users with a warning or something, or silently ignoring the problem. I think I'll end up doing the latter, but showing some warning in console for developers trying to debug what's going on.

josephguillaume · 2024-08-29T12:09:39Z

The type index spec is a little unclear about what solid:instanceContainer actually contains, but I agree that your interpretation seems to be the one in use, and have added an issue to document that view (in the process providing my opinion to a query Angelo had raised on this)

I also quoted the position that "Linked data is a set of documents", which came up in a disagreement as to whether the document (in that case a type index) should use predicates to link to its contents (in that case a type registration), or whether it is sufficient for the contents to be in the document.

Personally I have used both patterns.
I use contents in a document if I want a store of many instances, i.e. the document is just a vessel.
I link from a list to instances if I want to be able to enumerate them - even without dereferencing the URIs of the instances.

It seems both approaches can easily coexist, so I don't have an issue with Media Kraken sticking to this approach.

NoelDeMartin added the enhancement New feature or request label Aug 25, 2024

josephguillaume mentioned this issue Aug 29, 2024

Confirm that solid:instanceContainer container contains documents containing instances? solid/type-indexes#34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in attempting to load RDFa #34

Error in attempting to load RDFa #34

josephguillaume commented Aug 24, 2024

NoelDeMartin commented Aug 25, 2024

josephguillaume commented Aug 27, 2024

NoelDeMartin commented Aug 28, 2024

josephguillaume commented Aug 29, 2024

Error in attempting to load RDFa #34

Error in attempting to load RDFa #34

Comments

josephguillaume commented Aug 24, 2024

NoelDeMartin commented Aug 25, 2024

josephguillaume commented Aug 27, 2024

NoelDeMartin commented Aug 28, 2024

josephguillaume commented Aug 29, 2024