You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some old but important Java interfaces use classes like InputSource or TranscoderInput to encapsulate several possible inputs for certain specialized methods.
At the time when these APIs were designed it looked like a good idea to put any possible input source inside a single object, but this approach confuses and misguides people. Also goes against the principles of Separation of Concerns ("do not overlap") and of Single Responsibility ("every object should have a single responsibility").
The main issues
Both InputSource and TranscoderInput can take an InputStream or a Reader. In the case of InputSource, one may also set the encoding with setEncoding(), although if one knows the encoding why not just construct a Reader and pass it?
But let's focus on the case where one passes an InputStream but not the encoding. The document parser must then figure out the encoding by itself, which is not a trivial task. Maybe there was encoding information provided by the HTTP headers, but in InputSource it is now lost (perhaps just ignored by the developer who knows that InputSource accepts this responsibility).
Although one can provide reasonably good approximations by inspecting the stream, doing the task correctly (closer to what web browsers do) would involve the usage of a good unicode library (which should be updated frequently). That's something that only one non-mainstream parser actually does, and a responsibility that we probably want to separate from the class that parses or transcodes a document.
One can also provide a URI string and nothing more, and then the parser or transcoder shall connect to that resource and process it. This however, has important security implications: first, what timeout should the parser or transcoder use? A small timeout may produce I/O errors, but a large one opens a window for a Denial of Service. Only the final application knows the correct answer.
What should be done if the connection is redirected to a server in a trusted network, like the internal network? The standard Java connection shall faithfully follow it, do we really want this responsibility for the parser or the transcoder, or is it something to be dealt by the calling application?
And then, we have the reverse of the problem that we had previously: the parser won't return back the header information, including the important CORS headers.
Let's provide a Reader or a InputStream, now we do not need to provide a URI, right? The problem is that without the URI we cannot apply origin-based policies when transcoding, starting with the default Same-origin policy.
The TranscoderInput can take a DOM Document as input. However, the transcoder expects the document to have been built with either of its two SVG DOM implementations, and otherwise shall just import the old document into a new one, then ditch the old DOM. This is some waste of resources.
Then, not always the right DOM implementation is being picked by the application, and/or may not set the correct KEY_DOM_IMPLEMENTATION hint, which has been the origin of some application bugs (for example SVG 1.2 elements appearing in 1.1 documents and causing class cast exceptions).
It is also worth noting that one bug in the area of DOM imports was fixed in EchoSVG, meaning that this input approach was not as well tested as others (although everything should be fine around here).
Conclusions
There are other, smaller misunderstandings that developers can make when using those input wrappers. Instead of the wrappers, I suggest to use simple interfaces that take the minimum responsibility and explain what information they do really need. Those interfaces will not be as much flexible as the ones using the old wrappers, but should lead to better security and an improved usage of resources.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Some old but important Java interfaces use classes like
InputSource
orTranscoderInput
to encapsulate several possible inputs for certain specialized methods.At the time when these APIs were designed it looked like a good idea to put any possible input source inside a single object, but this approach confuses and misguides people. Also goes against the principles of Separation of Concerns ("do not overlap") and of Single Responsibility ("every object should have a single responsibility").
The main issues
InputSource
andTranscoderInput
can take anInputStream
or aReader
. In the case ofInputSource
, one may also set the encoding with setEncoding(), although if one knows the encoding why not just construct aReader
and pass it?But let's focus on the case where one passes an
InputStream
but not the encoding. The document parser must then figure out the encoding by itself, which is not a trivial task. Maybe there was encoding information provided by the HTTP headers, but inInputSource
it is now lost (perhaps just ignored by the developer who knows thatInputSource
accepts this responsibility).Although one can provide reasonably good approximations by inspecting the stream, doing the task correctly (closer to what web browsers do) would involve the usage of a good unicode library (which should be updated frequently). That's something that only one non-mainstream parser actually does, and a responsibility that we probably want to separate from the class that parses or transcodes a document.
What should be done if the connection is redirected to a server in a trusted network, like the internal network? The standard Java connection shall faithfully follow it, do we really want this responsibility for the parser or the transcoder, or is it something to be dealt by the calling application?
And then, we have the reverse of the problem that we had previously: the parser won't return back the header information, including the important CORS headers.
Reader
or aInputStream
, now we do not need to provide a URI, right? The problem is that without the URI we cannot apply origin-based policies when transcoding, starting with the default Same-origin policy.TranscoderInput
can take a DOMDocument
as input. However, the transcoder expects the document to have been built with either of its two SVG DOM implementations, and otherwise shall just import the old document into a new one, then ditch the old DOM. This is some waste of resources.Then, not always the right DOM implementation is being picked by the application, and/or may not set the correct
KEY_DOM_IMPLEMENTATION
hint, which has been the origin of some application bugs (for example SVG 1.2 elements appearing in 1.1 documents and causing class cast exceptions).It is also worth noting that one bug in the area of DOM imports was fixed in EchoSVG, meaning that this input approach was not as well tested as others (although everything should be fine around here).
Conclusions
There are other, smaller misunderstandings that developers can make when using those input wrappers. Instead of the wrappers, I suggest to use simple interfaces that take the minimum responsibility and explain what information they do really need. Those interfaces will not be as much flexible as the ones using the old wrappers, but should lead to better security and an improved usage of resources.
Beta Was this translation helpful? Give feedback.
All reactions