Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PropertyString, cleanHtml helper and escapeHtmlExt helper. #4004

Draft
wants to merge 9 commits into
base: dev
Choose a base branch
from

Conversation

EreMaijala
Copy link
Contributor

Related to #3998, this is a draft for PropertyString that can carry additional information along with a plain text string. It now has explicit support for HTML content, and there's also a cleanHtml helper to handle sanitization. escapeHtmlExt is a replacement for escapeHtml allowing the HTML version to be returned if desired.

@maccabeelevine What do you think? Obviously this needs some template changes where HTML is desirable, but should otherwise be fairly easy to use.

TODO:

  • Add unit tests

@EreMaijala EreMaijala marked this pull request as draft October 11, 2024 10:20
Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @EreMaijala, looks like a great start! A few minor thoughts/suggestions...

* @license http://opensource.org/licenses/gpl-2.0.php GNU General Public License
* @link https://vufind.org/wiki/development Wiki
*/
class EscapeHtmlExt extends \Laminas\View\Helper\Escaper\AbstractHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this name, but I also can't think of a better one... :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither. I wanted to extend EscapeHtml, but the silly @Final thing prevents that. And while we could substitute our own EscapeHtml, the Laminas class is used in so many places, that it's not quite straightforward. We could of course change all the references to an interface, but that would require more wide-ranging changes. If you think that'd be a better way forward, I'd be happy to work on that. What would support that is the fact that EscapeHtmlAttr needs a substitute as well (to be able to control the IE-compatibility of the escaping process), so doing both at the same time would probably make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have really strong feelings on this -- no solution feels obviously like the best. Maybe @maccabeelevine will have some thoughts to share... ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And while we could substitute our own EscapeHtml, the Laminas class is used in so many places, that it's not quite straightforward.

This was certainly my original idea with #3998, to make the template changes as simple as possible, and it's safer here than in my PoC implementation since it would do nothing unless you had a PropertyStringInterface value and passed allowHtml. But I'm sure there are complications I'm not seeing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A month later, a fresh thought on the name: would escapeOrCleanHtml be more descriptive than escapeHtmlExt, since that's what this is doing -- escaping the HTML unless told to clean it instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Ere agrees, I like that naming!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it a lot better than escapeHtmlExt!

Copy link
Member

@maccabeelevine maccabeelevine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea here, and it's so much better than the original #3998 PoC. I added some specific questions, but I guess my primary question (and maybe the hardest to answer) is how we deal with generic templates that may or may not have HTML in certain fields for some backends.

So, would it be appropriate in the default record driver's core.html to change

<h1<?=$this->schemaOrg()->getAttributes(['property' => 'name'])?>><?=$this->escapeHtml($this->driver->getShortTitle() . ' ' . $this->driver->getSubtitle() . ' ' . $this->driver->getTitleSection())?></h1>

to

<h1<?=$this->schemaOrg()->getAttributes(['property' => 'name'])?>><?=$this->escapeHtmlExt($this->driver->getShortTitle(), allowHtml: true)?> <?=$this->escapeHtml($this->driver->getSubtitle() . ' ' . $this->driver->getTitleSection())?></h1>

Performance-wise, this should be ok, since for any other record driver the title would not be a PropertyString and so the allowHtml would be ignored. But is it ok from a template complexity standpoint?

* @license http://opensource.org/licenses/gpl-2.0.php GNU General Public License
* @link https://vufind.org/wiki/development Wiki
*/
class EscapeHtmlExt extends \Laminas\View\Helper\Escaper\AbstractHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And while we could substitute our own EscapeHtml, the Laminas class is used in so many places, that it's not quite straightforward.

This was certainly my original idea with #3998, to make the template changes as simple as possible, and it's safer here than in my PoC implementation since it would do nothing unless you had a PropertyStringInterface value and passed allowHtml. But I'm sure there are complications I'm not seeing.

@demiankatz
Copy link
Member

Regarding @maccabeelevine's question about changing the default driver's core.html to support HTML, I don't think I would have a problem with that in the interest of greater flexibility....

@EreMaijala
Copy link
Contributor Author

@demiankatz Do we want to support HTML for all fields or for longer textual fields like summary? Different choices will lead to different complications e.g. with search term highlighting.

@demiankatz
Copy link
Member

@demiankatz Do we want to support HTML for all fields or for longer textual fields like summary? Different choices will lead to different complications e.g. with search term highlighting.

If we wanted to start small, I would think the bare minimum would be longer textual fields and titles. But I'm not opposed to rolling out the support more widely if it's easier to do it all at once than piecemeal. I tend to favor the most pragmatic strategy, whatever that turns out to be. :-)

@EreMaijala
Copy link
Contributor Author

Here are a couple of thought:

  1. Could add a default "allow HTML" toggle and/or setting to the escaper so that the parameter doesn't need to be repeated. Downside: easy to accidentally leave enabled outside of the intended scope.
  2. Could add configurable defaults per field type ('title', 'alt-title', 'summary' etc.). This would allow specifying the fields with HTML in configuration.
  3. truncate helper is going to have trouble with HTML. Perhaps we can make it smarter, but it won't be easy.
  4. titles will be a problem with any tags that you can't have inside a heading like <h1>. Perhaps we need the field type from 2. to also indicate which tags are allowed.
  5. escapeHtmlExt needs to deviate a bit further from the invoke params available with Laminas' helper. It seems that it'd be better to decouple.

@demiankatz
Copy link
Member

1. Could add a default "allow HTML" toggle and/or setting to the escaper so that the parameter doesn't need to be repeated. Downside: easy to accidentally leave enabled outside of the intended scope.

I think the downsides probably outweigh the benefits here, since I can't think of any scenario where this doesn't potentially lead to unexpected side effects or confusion. I think it's better to be explicit.

2. Could add configurable defaults per field type ('title', 'alt-title', 'summary' etc.). This would allow specifying the fields with HTML in configuration.

...and this would be a good way to be explicit in a number of contexts.

3. truncate helper is going to have trouble with HTML. Perhaps we can make it smarter, but it won't be easy.

Possible lazy solution: truncate based on a tag-stripped version. If the stripped version is under the limit, display the HTML as-is. If the stripped version is too long, show the truncated, stripped version, and the user will have to "see more" to get the rich version. Not ideal, but maybe a quick place to start.

4. titles will be a problem with any tags that you can't have inside a heading like `<h1>`. Perhaps we need the field type from 2. to also indicate which tags are allowed.

Maybe we need the ability to define multiple named tag allow-lists. Then we could have a "default" list and a "heading" list and the configurable settings could refer to an allow-list, or "none". This would empower users to create their own more granular lists as needed, but we could use these two obvious ones as a starting point.

5. escapeHtmlExt needs to deviate a bit further from the invoke params available with Laminas' helper. It seems that it'd be better to decouple.

Given the way Laminas seems to be gradually cutting off the ability to extend anything, I agree that building tools that meet our needs and wrap around Laminas' public interfaces is better than trying to build upon those interfaces directly.

@demiankatz demiankatz deleted the branch vufind-org:dev November 1, 2024 18:01
@demiankatz demiankatz closed this Nov 1, 2024
@demiankatz demiankatz reopened this Nov 5, 2024
@demiankatz demiankatz changed the base branch from dev-11.0 to dev November 5, 2024 14:33
@demiankatz demiankatz added this to the 11.0 milestone Nov 5, 2024
@demiankatz
Copy link
Member

This PR was accidentally closed by the deletion of the dev-11.0 branch; I have restored and reopened it. Sorry for the inconvenience!

Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look at this and had just a few more minor thoughts and questions...

* @license http://opensource.org/licenses/gpl-2.0.php GNU General Public License
* @link https://vufind.org/wiki/development Wiki
*/
class EscapeHtmlExt extends \Laminas\View\Helper\Escaper\AbstractHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A month later, a fresh thought on the name: would escapeOrCleanHtml be more descriptive than escapeHtmlExt, since that's what this is doing -- escaping the HTML unless told to clean it instead?

'<br>',
array_map(
function ($summary) {
$htmlContent = str_starts_with($summary, '<') && str_ends_with($summary, '>');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a way to eliminate this hacky check for angle brackets. Should we do the check in the record driver and wrap the string in a PropertyString? Should the new helper have a switch to enable this check, or even make this behavior part of the standard "allowHtml"? I'm not totally sure, but it feels like we can do better here, especially since as written, there's no safety on the summary in the special case.

// which ensures that the libxml2 options (namely keepBlanks) are set up
// properly, and whitespace nodes are preserved. This should not be an
// issue from libxml2 version 2.9.5, but during testing the issue was
// still intermittently present. Regardless of that, CentOS 7.x have an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this note about CentOS 7 still relevant now that it's so thoroughly EOL'ed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants