Multi-input wrappers like `InputSource` or `TranscoderInput` considered harmful #64

carlosame · 2022-09-07T13:02:23Z

carlosame
Sep 7, 2022
Maintainer

Some old but important Java interfaces use classes like InputSource or TranscoderInput to encapsulate several possible inputs for certain specialized methods.

At the time when these APIs were designed it looked like a good idea to put any possible input source inside a single object, but this approach confuses and misguides people. Also goes against the principles of Separation of Concerns ("do not overlap") and of Single Responsibility ("every object should have a single responsibility").

The main issues

Both InputSource and TranscoderInput can take an InputStream or a Reader. In the case of InputSource, one may also set the encoding with setEncoding(), although if one knows the encoding why not just construct a Reader and pass it?

But let's focus on the case where one passes an InputStream but not the encoding. The document parser must then figure out the encoding by itself, which is not a trivial task. Maybe there was encoding information provided by the HTTP headers, but in InputSource it is now lost (perhaps just ignored by the developer who knows that InputSource accepts this responsibility).

Although one can provide reasonably good approximations by inspecting the stream, doing the task correctly (closer to what web browsers do) would involve the usage of a good unicode library (which should be updated frequently). That's something that only one non-mainstream parser actually does, and a responsibility that we probably want to separate from the class that parses or transcodes a document.

One can also provide a URI string and nothing more, and then the parser or transcoder shall connect to that resource and process it. This however, has important security implications: first, what timeout should the parser or transcoder use? A small timeout may produce I/O errors, but a large one opens a window for a Denial of Service. Only the final application knows the correct answer.

What should be done if the connection is redirected to a server in a trusted network, like the internal network? The standard Java connection shall faithfully follow it, do we really want this responsibility for the parser or the transcoder, or is it something to be dealt by the calling application?

And then, we have the reverse of the problem that we had previously: the parser won't return back the header information, including the important CORS headers.

Let's provide a Reader or a InputStream, now we do not need to provide a URI, right? The problem is that without the URI we cannot apply origin-based policies when transcoding, starting with the default Same-origin policy.

The TranscoderInput can take a DOM Document as input. However, the transcoder expects the document to have been built with either of its two SVG DOM implementations, and otherwise shall just import the old document into a new one, then ditch the old DOM. This is some waste of resources.

Then, not always the right DOM implementation is being picked by the application, and/or may not set the correct KEY_DOM_IMPLEMENTATION hint, which has been the origin of some application bugs (for example SVG 1.2 elements appearing in 1.1 documents and causing class cast exceptions).

It is also worth noting that one bug in the area of DOM imports was fixed in EchoSVG, meaning that this input approach was not as well tested as others (although everything should be fine around here).

Conclusions

There are other, smaller misunderstandings that developers can make when using those input wrappers. Instead of the wrappers, I suggest to use simple interfaces that take the minimum responsibility and explain what information they do really need. Those interfaces will not be as much flexible as the ones using the old wrappers, but should lead to better security and an improved usage of resources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-input wrappers like `InputSource` or `TranscoderInput` considered harmful #64

{{title}}

Replies: 0 comments

Select a reply

Multi-input wrappers like InputSource or TranscoderInput considered harmful #64

carlosame Sep 7, 2022 Maintainer

The main issues

Conclusions

Replies: 0 comments

Multi-input wrappers like `InputSource` or `TranscoderInput` considered harmful #64

carlosame
Sep 7, 2022
Maintainer