Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify inner_text implementation using lxml's text method #718

Merged
merged 2 commits into from
Jun 22, 2024

Conversation

apasel422
Copy link
Contributor

@apasel422 apasel422 commented Jun 22, 2024

Rather than using regex to remove tags and attributes after the fact.

https://lxml.de/api/lxml.etree-module.html#tostring

This also eliminates the need to perform HTML unescaping.

On my local machine, this reduces the time spent in the inner_text method on Don Quixote from 1.5 seconds to 0.04 seconds.

We add a basic test for the inner_text method in a preliminary commit to demonstrate that the behavior of the new implementation is equivalent.

Rather than using regex to remove tags and attributes after the fact.

https://lxml.de/api/lxml.etree-module.html#tostring

This also eliminates the need to perform HTML unescaping.

On my local machine, this reduces the time spent in the inner_text
method on Don Quixote from 1.5 seconds to 0.04 seconds.
@apasel422 apasel422 marked this pull request as ready for review June 22, 2024 13:41
@@ -34,3 +34,14 @@ def test_add_landmark_with_title():
add_landmark(dom, "file", landmarks)

assert landmarks[0].title == "Bar"

def test_inner_text():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, glad to see we're getting some tests in here. Do you have to also rebuild the test golden files when adding a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure: I added this test and re-ran pytest and everything worked fine.

@apasel422 apasel422 requested a review from acabal June 22, 2024 16:55
@acabal
Copy link
Member

acabal commented Jun 22, 2024

@apasel422
Copy link
Contributor Author

I think you do, see https://github.com/standardebooks/tools/blob/master/tests/README.md#creating-a-test

That doesn't seem to apply to test_internals.py. (If I deliberately make the new test incorrect via a faulty assertion, it fails as expected when I run pytest).

@acabal acabal merged commit ad56328 into standardebooks:master Jun 22, 2024
1 check passed
@acabal
Copy link
Member

acabal commented Jun 22, 2024

OK then, thanks!

@apasel422 apasel422 deleted the inner-text branch June 22, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants