Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBX-7987: Node filter for extraction text #155

Open
wants to merge 3 commits into
base: 4.6
Choose a base branch
from

Conversation

adamwojs
Copy link
Member

@adamwojs adamwojs commented Mar 19, 2024

Question Answer
JIRA issue IBX-7987
Bug/Improvement yes
New feature yes
Target version 4.6+
BC breaks no
Tests pass yes
Doc needed yes

Background

See JIRA issue.

Node filtering

This PR introduced extension point allowing to exclude nodes from text extraction:

namespace Ibexa\Contracts\FieldTypeRichText\RichText\TextExtractor;

interface NodeFilterInterface
{
    /**
     * Return false to preserve the node, true to remove it.
     */
    public function filter(DOMNode $node): bool;
}

\Ibexa\Contracts\FieldTypeRichText\RichText\TextExtractor\NodeFilterInterface implementations should be tagged via ibexa.field_type.richtext.text_extractor.node_filter

Common use case

\Ibexa\Contracts\FieldTypeRichText\RichText\TextExtractor\NodeFilterFactoryInterface 1 covers the common use case: excluding node with given path, e.g. /foo/bar/baz.

interface NodeFilterFactoryInterface
{
    public function createPathFilter(string ...$path): NodeFilterInterface;
}

Usage example:

services:
    # ...
    app.field_type.richtext.text_extractor.node_filter.custom:
        class: Ibexa\Contracts\FieldTypeRichText\RichText\TextExtractor\NodeFilterInterface
        factory: ['@Ibexa\Contracts\FieldTypeRichText\RichText\TextExtractor\NodeFilterFactoryInterface', 'createPathFilter']
        arguments: ['foo', 'bar', 'baz']
        tags:
            - { name: ibexa.field_type.richtext.text_extractor.node_filter }

1 The Factory pattern prevents the exposure of implementation details within contracts

Nodes excluded out of the box

The following nodes are excluded by default (xpath syntax):

  • //eztemplate/ezconfig

Notes for QA

Identify other potential elements which should be excluded from full text index

TODO:

  • Implement feature / fix a bug.
  • Implement tests.
  • Fix new code according to Coding Standards ($ composer fix-cs).
  • Ask for Code Review.

@adamwojs adamwojs force-pushed the filter_node_for_text_extraction branch from b77b0f1 to 8193162 Compare March 20, 2024 06:10
@adamwojs adamwojs changed the title Node filter for extraction text IBX-7987: Node filter for extraction text Mar 20, 2024
@adamwojs adamwojs added the Doc needed The changes require some documentation label Mar 20, 2024
@adamwojs adamwojs force-pushed the filter_node_for_text_extraction branch from ef8a6e0 to f9aaba8 Compare March 27, 2024 06:59
@adamwojs adamwojs marked this pull request as ready for review March 27, 2024 09:32
@adamwojs adamwojs requested review from alongosz, konradoboza and a team March 27, 2024 09:32
@adamwojs adamwojs changed the base branch from main to 4.6 March 27, 2024 09:40
@adamwojs adamwojs force-pushed the filter_node_for_text_extraction branch from f9aaba8 to 40499a9 Compare March 27, 2024 09:41
Copy link
Contributor

@konradoboza konradoboza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach and convenient usage. LGTM.

@konradoboza konradoboza requested a review from a team March 27, 2024 10:24
Comment on lines +18 to +21
/**
* Return false to preserve the node, true to remove it.
*/
public function filter(DOMNode $node): bool;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the result of false/true combined with filter might be a bit misleading. My first thought was that it should work the opposite way. Maybe changing it to filterOut would be more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personalny stand with the original naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to agree with @alongosz here.

Most methods (ArrayCollection::filter) and functions (array_filter) work in the opposite way:

  • when true, an entry is preserved
  • when false, an entry is removed

Therefore, I'd suggest reversing the logic to comply with generally established PHP practice.

@alongosz alongosz requested a review from a team March 29, 2024 16:26
Copy link
Contributor

@Steveb-p Steveb-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Before giving approval, would love to see a test that includes both FullTextExtractor and filters that you're using, as we're solving a specific need - to filter out nodes when working with FullTextExtractor, right?

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Comment on lines +18 to +21
/**
* Return false to preserve the node, true to remove it.
*/
public function filter(DOMNode $node): bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to agree with @alongosz here.

Most methods (ArrayCollection::filter) and functions (array_filter) work in the opposite way:

  • when true, an entry is preserved
  • when false, an entry is removed

Therefore, I'd suggest reversing the logic to comply with generally established PHP practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Doc needed The changes require some documentation Ready for QA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants