diff --git a/README.md b/README.md index 4f2dd19d..d24cf5c5 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ ## Powerful and flexible engine for BeautifulSoup -[![PyPI](https://img.shields.io/pypi/v/soupsavvy?color=orange)](https://pypi.org/project/soupsavvy/) [![Python Versions](https://img.shields.io/pypi/pyversions/soupsavvy)](https://www.python.org/) [![Codecov](https://codecov.io/gh/sewcio543/soupsavvy/graph/badge.svg?token=RZ51VS3QLB)](https://codecov.io/gh/sewcio543/soupsavvy) [![Docs link](https://img.shields.io/badge/docs-read-blue)](https://sewcio543.github.io/soupsavvy/) +[![PyPI](https://img.shields.io/pypi/v/soupsavvy?color=orange)](https://pypi.org/project/soupsavvy/) [![Python Versions](https://img.shields.io/pypi/pyversions/soupsavvy)](https://www.python.org/) [![Codecov](https://codecov.io/gh/sewcio543/soupsavvy/graph/badge.svg?token=RZ51VS3QLB)](https://codecov.io/gh/sewcio543/soupsavvy) [![Docs link](https://img.shields.io/badge/docs-read-blue)](https://soupsavvy.readthedocs.io/en/latest/) ## Table of Contents @@ -34,15 +34,15 @@ pip install soupsavvy ## Documentation -Full documentation can be found at **[documentation](https://sewcio543.github.io/soupsavvy/)**. +Full documentation can be found at **[documentation](https://soupsavvy.readthedocs.io/en/latest/)**. ## Demos -For more information about the package, its concepts and usage, read `Demos` section of the **[documentation](https://sewcio543.github.io/soupsavvy)**. It's step by step guide to the most important features of the package. +For more information about the package, its concepts and usage, read `Demos` section of the **[documentation](https://soupsavvy.readthedocs.io/en/latest/)**. It's step by step guide to the most important features of the package. ## Contributing -If you'd like to contribute to soupsavvy, feel free to check out the [GitHub repository](https://github.com/sewcio543/soupsavvy) and submit pull requests into one of development branches. Any feedback, bug reports, or feature requests are welcome! +If you'd like to contribute to soupsavvy, feel free to check out the [GitHub repository](https://github.com/sewcio543/soupsavvy) and submit pull requests into one of development branches. Any feedback, bug reports, or feature requests are welcome! In case of any doubts, follow [Contribution Guidelines](https://github.com/sewcio543/soupsavvy/blob/main/CONTRIBUTING.md) ## License @@ -51,7 +51,7 @@ If you'd like to contribute to soupsavvy, feel free to check out the [GitHub rep ## Acknowledgements -`soupsavvy` is built upon the foundation of excellent BeautifulSoup. We extend our gratitude to the developers and contributors of this projects for their invaluable contributions to the Python community and making our life a lot easier! +`soupsavvy` is built upon the foundation of excellent `BeautifulSoup`. We extend our gratitude to the developers of this projects for their invaluable contributions to the Python community and making our life a lot easier! ----------------- diff --git a/demos/about.ipynb b/demos/about.ipynb index 910df4f7..b3ab7261 100644 --- a/demos/about.ipynb +++ b/demos/about.ipynb @@ -550,7 +550,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Summary" + "## Conclusion" ] }, { diff --git a/demos/combining.ipynb b/demos/combining.ipynb index f73b71df..1343fa5d 100644 --- a/demos/combining.ipynb +++ b/demos/combining.ipynb @@ -2132,7 +2132,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Summary" + "## Conclusion" ] }, { diff --git a/demos/css.ipynb b/demos/css.ipynb new file mode 100644 index 00000000..f2594146 --- /dev/null +++ b/demos/css.ipynb @@ -0,0 +1,736 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# CSS Selectors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`soupsavvy.selectors.css` is a subpackage that provides a range of selectors based on css. These are wrappers, that use [`soupsieve`](https://github.com/facelessuser/soupsieve) library under the hood, which is a *modern CSS selector implementation for BeautifulSoup*. This module includes a variety of selectors that can be combined with other `soupsavvy` selectors, using pure css or most commonly used css pseudo-classes." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Child Selectors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Child selectors are designed to select elements based on their position relative to their siblings within a parent element. While the `nth-child` selector can theoretically handle any position-based selection, `soupsavvy` offers convenient wrappers for a few commonly used CSS pseudo-classes." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### FirstChild\n", + "\n", + "The `FirstChild` selector selects every element that is the first child of its parent. This selector is equivalent to the CSS `:first-child` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":first-child\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import FirstChild\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

First

\n", + "
\n", + " First\n", + " \n", + " First\n", + " \n", + "
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = FirstChild()\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### LastChild\n", + "\n", + "The `LastChild` selector selects every element that is the last child of its parent. This is equivalent to the CSS `:last-child` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":last-child\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import LastChild\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

\n", + "
\n", + "
\n", + " \n", + " Last\n", + " \n", + " Last\n", + "
\n", + "
Last
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = LastChild()\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### NthChild\n", + "\n", + "The `NthChild` selector allows you to select elements based on their position among their siblings using a CSS-style `nth-child` expression.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":nth-child(3)\n", + "```\n", + "\n", + "`NthChild` and all other nth selectors fully supports all valid CSS `nth` parameter values, enabling you to select elements based on their position among siblings using the same syntax as in CSS.\n", + "\n", + "```css\n", + ":nth-child(2n)\n", + ":nth-child(odd)\n", + ":nth-child(even)\n", + ":nth-child(-n+2)\n", + "```\n", + "\n", + "```python\n", + "NthChild('2n')\n", + "NthChild('odd')\n", + "NthChild('even')\n", + "NthChild('-n+2')\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import NthChild\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

1

\n", + "

2

\n", + "

3

\n", + "

4

\n", + "

5

\n", + "

6

\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = NthChild(\"2n\")\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### NthLastChild\n", + "\n", + "The `NthLastChild` selector allows you to select elements based on their position among their siblings, counting from the last child of the parent element. This is equivalent to the CSS `:nth-last-child` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":nth-last-child(3)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import NthLastChild\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

1

\n", + "

2

\n", + "

3

\n", + "

4

\n", + "

5

\n", + "

6

\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = NthLastChild(\"odd\")\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### OnlyChild\n", + "\n", + "The `OnlyChild` selector matches elements that are the only child of their parent. This mirrors the functionality of the CSS `:only-child` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":only-child\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import OnlyChild\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

Text

\n", + "
\n", + " \n", + " Text\n", + "
\n", + "

Only child

\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = OnlyChild()\n", + "selector.find(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Type selectors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Type selectors are used to select elements based on their position relative to siblings of the same type. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### FirstOfType\n", + "\n", + "Selects every element that is the first child of its parent of a particular type. This selector is equivalent to the CSS `:first-of-type` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":first-of-type\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import FirstOfType\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

First p

\n", + "
First div
\n", + "
\n", + " First span\n", + " \n", + " First a\n", + " \n", + "
\n", + "

\n", + "
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = FirstOfType()\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### LastOfType\n", + "\n", + "Selects every element that is the last child of its parent of a particular type. This selector is equivalent to the CSS `:last-of-type` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":last-of-type\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import LastOfType\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

Last p

\n", + "
\n", + " \n", + " Last a\n", + " \n", + " Last span\n", + "
\n", + "
Last div
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = LastOfType()\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### NthOfType\n", + "\n", + "Selects every element that is the nth child of its parent of a particular type. This selector is equivalent to the CSS `:nth-of-type` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":nth-of-type(n)\n", + "```\n", + "Similar to nth child counterparts, every variant of valid CSS `nth` parameter values are supported." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import NthOfType\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

1

\n", + " 1\n", + "

2

\n", + " 2\n", + "

3

\n", + " 3\n", + "

4

\n", + " 4\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = NthOfType(\"2n+2\")\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### NthLastOfType\n", + "\n", + "Selects every element that is the nth child of its parent of a particular type, counting from the last child.\n", + "This selector is equivalent to the CSS `:nth-last-of-type` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":nth-last-of-type(n)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import NthLastOfType\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

1

\n", + " 1\n", + "

2

\n", + " 2\n", + "

3

\n", + " 3\n", + "

4

\n", + " 4\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = NthLastOfType(\"-n+2\")\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### OnlyOfType\n", + "\n", + "Selects every element that is the only child of its parent of a particular type. This selector is equivalent to the CSS `:only-of-type` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":only-of-type\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import OnlyOfType\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "
\n", + " First span\n", + " Second span\n", + "
\n", + "

Only p

\n", + "
\n", + " Only span\n", + " Only a\n", + "
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = OnlyOfType()\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Other selectors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Empty\n", + "\n", + "Selects every element that has no children and no text content. This selector is equivalent to the CSS `:empty` pseudo-class.\n", + "\n", + "**CSS Example:**\n", + "```css\n", + ":empty\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import Empty\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

Text

\n", + "
\n", + " \n", + " Text\n", + " \n", + " \n", + "
\n", + "
Text
\n", + "

\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = Empty()\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### CSS\n", + "\n", + "Wrapper for any CSS selector, uses `soupsieve` under the hood, so support is limited to its version.\n", + "Convenience class for search based on CSS selector, results of the search are equivalent to `BeautifulSoup` `select` method.\n", + "\n", + "**Using BeautifulSoup:**\n", + "```python\n", + "soup.select('div > p')\n", + "```\n", + "\n", + "**Using soupsieve:**\n", + "```python\n", + "soupsieve.select_one('div > p', soup)\n", + "```\n", + "\n", + "**Using soupsavvy:**\n", + "```python\n", + "CSS('div > p').find(soup)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import CSS\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "
Not span
\n", + " Not first\n", + "
Not .foo
\n", + "
Found
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = CSS(\"span.foo:first-child\")\n", + "selector.find(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Combining selectors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "CSS based selectors can be combined with other `soupsavvy` selectors to create composite selectors. For example, to select all elements, that are not empty and are children of a div element, the following selector can be used:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy import TypeSelector\n", + "from soupsavvy.selectors.css import Empty\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

Text

\n", + "
\n", + " \n", + " Text\n", + " \n", + "

\n", + "
\n", + "

\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = TypeSelector(\"div\") > (~Empty())\n", + "selector.find_all(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For finding all elements that have one child and are last child of their parent following selector can be used:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy import Anchor, HasSelector\n", + "from soupsavvy.selectors.css import LastChild, OnlyChild\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + "

Text

\n", + "
\n", + " \n", + " Text\n", + "
\n", + "
Only Child
\n", + "
Only Child - Last
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "only_child = Anchor > OnlyChild()\n", + "selector = HasSelector(only_child) & LastChild()\n", + "selector.find(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Recursivity\n", + "\n", + "On the contrary to `find` methods, `BeautifulSoup` `select` does not provide option to search only for direct children of the element (`recursive=False`). This is justified, as CSS selectors have an implied anchor of universal selector (`*`), which selects all elements. `soupsavvy` consistently allows to search in **not recursive** mode by setting `recursive=False` in the `find` methods. In such case, only direct children of the element that match the selector are returned." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy.selectors.css import CSS\n", + "\n", + "soup = BeautifulSoup(\n", + " \"\"\"\n", + " \n", + "
\n", + "
Descendant
\n", + "
\n", + "
Child
\n", + " \"\"\",\n", + " features=\"html.parser\",\n", + ")\n", + "\n", + "selector = CSS(\"div.foo\")\n", + "selector.find(soup, recursive=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`soupsavvy` provides a convenient way to select elements using css selectors. It provides wrappers for commonly used pseudo-classes that share the same implementation as other selectors and can be easily combined." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Enjoy `soupsavvy` and leave us feedback!** \n", + "**Happy scraping!**" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "soupsavvy", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/demos/selectors.ipynb b/demos/selectors.ipynb index ee69b6c6..3ed61553 100644 --- a/demos/selectors.ipynb +++ b/demos/selectors.ipynb @@ -604,7 +604,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Summary" + "## Conclusion" ] }, { diff --git a/demos/testing.ipynb b/demos/testing.ipynb new file mode 100644 index 00000000..9e482a25 --- /dev/null +++ b/demos/testing.ipynb @@ -0,0 +1,436 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Testing" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`soupsavvy` provides utilities for testing your selectors. By leveraging the features of the `soupsavvy.testing` subpackage, you can validate your selectors, check them against various edge cases, and ensure they work as expected." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Generators\n", + "\n", + "The subpackage includes HTML code generators designed specifically for testing purposes. These generators help you create controlled HTML structures to simulate various scenarios, ensuring your selectors perform as expected." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Attribute Generator\n", + "\n", + "Attribute generator is used to generate string representation of html attributes. While not so useful on its own, it becomes valuable when combined with the `TagGenerator`, offering extensive customization options." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Empty Attribute\n", + "\n", + "If only the first parameter (the attribute name) is passed to the `AttributeGenerator`, it generates an attribute with an empty value." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import AttributeGenerator\n", + "\n", + "generator = AttributeGenerator(\"class\")\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Constant value\n", + "\n", + "By passing the `value` parameter, you can set a specific value for the attribute." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import AttributeGenerator\n", + "\n", + "generator = AttributeGenerator(\"class\", value=\"book\")\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Templates\n", + "\n", + "Templates add another layer of customization by generating strings based on predefined logic, useful for creating dynamic and varied content in your test HTML." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### ChoiceTemplate\n", + "\n", + "The `ChoiceTemplate` allows you to generate a string by randomly selecting from a provided list of strings. For reproducibility, the `seed` parameter can be set to ensure the same output is generated across multiple runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import AttributeGenerator, ChoiceTemplate\n", + "\n", + "template = ChoiceTemplate([\"book\", \"article\", \"blog\"], seed=42)\n", + "generator = AttributeGenerator(\"class\", value=template)\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### RandomTemplate\n", + "\n", + "The `RandomTemplate` generates a string from randomly selected characters. The `length` parameter defines the string length (default is 4). Like `ChoiceTemplate`, the `seed` parameter ensures consistent output if needed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import AttributeGenerator, RandomTemplate\n", + "\n", + "template = RandomTemplate(length=5, seed=42)\n", + "generator = AttributeGenerator(\"class\", value=template)\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### User-defined Templates\n", + "\n", + "For advanced customization, you can create your own templates by subclassing `soupsavvy.testing.BaseTemplate` and implementing the `generate` method to return a string based on your specific logic.\n", + "\n", + "Here’s how you can define a custom template:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import BaseTemplate, TagGenerator\n", + "\n", + "\n", + "class CustomTemplate(BaseTemplate):\n", + " def __init__(self, connection): ...\n", + "\n", + " def generate(self):\n", + " # connects to external service\n", + " result = \"Hello from somewhere!\"\n", + " return result\n", + "\n", + "\n", + "template = CustomTemplate(connection=None)\n", + "generator = TagGenerator(\"span\", text=template)\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### TagGenerator\n", + "\n", + "`TagGenerator` is the primary tool for generating single HTML tags with customizable attributes, text, and child elements." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Name\n", + "\n", + "The `name` parameter is required and specifies the tag name, such as `div`, `span`, or `p`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import TagGenerator\n", + "\n", + "generator = TagGenerator(\"div\")\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Attributes\n", + "\n", + "The `attrs` parameter allows you to define the attributes of the tag. It accepts an iterable containing:\n", + "\n", + "- `str`: Just the attribute name, resulting in an empty value.\n", + "- `tuple`: A pair where the first element is the attribute name and the second is the value.\n", + "- `AttributeGenerator`: An object that dynamically generates attribute values.\n", + "\n", + "All attributes are converted to `AttributeGenerator` objects, enabling the use of templates for dynamic generation. Duplicate attributes will raise an error." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import AttributeGenerator, RandomTemplate, TagGenerator\n", + "\n", + "attrs = (\n", + " \"href\",\n", + " (\"class\", \"link\"),\n", + " (\"data-id\", RandomTemplate(seed=42)),\n", + " AttributeGenerator(\"title\", value=\"buy\"),\n", + ")\n", + "generator = TagGenerator(\"a\", attrs=attrs)\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Attributes of the tag must be unique, so trying to define `TagGenerator` with duplicate attributes will raise an error." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import soupsavvy.exceptions as exc\n", + "from soupsavvy.testing import TagGenerator\n", + "\n", + "try:\n", + " generator = TagGenerator(\"a\", attrs=[\"href\", \"href\"])\n", + "except exc.SoupsavvyException as e:\n", + " print(type(e))\n", + " print(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Children\n", + "\n", + "The `children` parameter lets you specify the tag's children, which must be `TagGenerator` objects. If no children are specified, the tag is created without any." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import TagGenerator\n", + "\n", + "child_generator = TagGenerator(\"span\")\n", + "generator = TagGenerator(\n", + " \"div\",\n", + " attrs=[\"class\"],\n", + " children=[child_generator],\n", + ")\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Self-closing Tags\n", + "\n", + "Self-closing tags like `br` are automatically handled. Defining a self-closing tag with children will raise an error." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import TagGenerator\n", + "\n", + "generator = TagGenerator(\"br\")\n", + "generator.generate()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import TagGenerator\n", + "\n", + "try:\n", + " generator = TagGenerator(\"hr\", children=[TagGenerator(\"span\")])\n", + "except exc.SoupsavvyException as e:\n", + " print(type(e))\n", + " print(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Text\n", + "\n", + "The `text` parameter allows you to add text content to the tag. This can be a static string or dynamically generated using templates." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import TagGenerator\n", + "\n", + "generator = TagGenerator(\"span\", text=\"Hello, World!\")\n", + "generator.generate()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from soupsavvy.testing import ChoiceTemplate, TagGenerator\n", + "\n", + "template = ChoiceTemplate([\"Hello, World!\", \"Hello, blog!\"], seed=42)\n", + "generator = TagGenerator(\"span\", text=template)\n", + "generator.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Usage\n", + "\n", + "Let's see how to use these generators in practice. In this example, we'll test a selector targeting `span` elements with text content starting with \"Hello\" that are children of `div` elements with `class` attribute of `book` and `role` attribute of `section`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "\n", + "from bs4 import BeautifulSoup\n", + "\n", + "from soupsavvy import AttributeSelector, ClassSelector, PatternSelector, TypeSelector\n", + "from soupsavvy.testing import AttributeGenerator, ChoiceTemplate, TagGenerator\n", + "\n", + "# 1: define the generator\n", + "template = ChoiceTemplate([\"Hello, World!\", \"Hello, blog!\"], seed=42)\n", + "child_generator = TagGenerator(\"span\", text=template)\n", + "generator = TagGenerator(\n", + " \"div\",\n", + " attrs=[\n", + " AttributeGenerator(\"class\", value=\"book\"),\n", + " AttributeGenerator(\"role\", value=\"section\"),\n", + " ],\n", + " children=[child_generator],\n", + ")\n", + "\n", + "# 2: define the selector\n", + "selector = (\n", + " TypeSelector(\"div\")\n", + " & ClassSelector(\"book\")\n", + " & AttributeSelector(\"role\", value=\"section\")\n", + ") > (TypeSelector(\"span\") & PatternSelector(re.compile(r\"^Hello\")))\n", + "\n", + "# 3: generate the soup\n", + "text = generator.generate()\n", + "soup = BeautifulSoup(text, features=\"lxml\")\n", + "\n", + "# 4: test selector on generated soup\n", + "selector.find(soup)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, we dynamically generate the HTML content using `TagGenerator` and then checked if the selector correctly identifies the intended elements." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "By using these generators, you can easily create HTML code to test whether your `soupsavvy` selector is correctly defined. This approach enables you to validate complex selectors in a dynamic and controlled environment.\n", + "\n", + "**Enjoy `soupsavvy` and leave us feedback!** \n", + "**Happy scraping!**" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "soupsavvy", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/source/index.md b/docs/source/index.md index 0a941792..6f609a4d 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -17,6 +17,8 @@ soupsavvy.selectors demos/about demos/selectors demos/combining +demos/css +demos/testing ``` ```{include} ../../README.md diff --git a/soupsavvy/selectors/css/selectors.py b/soupsavvy/selectors/css/selectors.py index 6839efea..545b3e65 100644 --- a/soupsavvy/selectors/css/selectors.py +++ b/soupsavvy/selectors/css/selectors.py @@ -6,6 +6,20 @@ They can be used in combination with other SoupSelector objects to create more complex tag selection conditions. + +This module contains the following classes: +* OnlyChild +* Empty +* FirstChild +* LastChild +* NthChild +* NthLastChild +* FirstOfType +* LastOfType +* NthOfType +* NthLastOfType +* OnlyOfType +* CSS - wrapper for simple search with CSS selectors """ from itertools import islice @@ -32,7 +46,11 @@ class CSSSoupSelector(SoupSelector, SelectableCSS): and can be easily used in combination with other SoupSelector objects. """ - def __init__(self, selector: str) -> None: + _SELECTOR: str + + def __init__(self) -> None: + selector = self.__class__._SELECTOR.format(*self._formats) + try: self._compiled = sv.compile(selector) except sv.SelectorSyntaxError: @@ -42,6 +60,13 @@ def __init__(self, selector: str) -> None: ) self._selector = selector + @property + def _formats(self) -> list[str]: + """ + List of arguments to be used to format css selector string of the selector. + """ + return [] + @property def css(self) -> str: return self._selector @@ -66,11 +91,6 @@ class OnlyChild(CSSSoupSelector): Class to select tags that are the only child of their parent. It uses the CSS selector `:only-child`. - Parameters - ---------- - tag : str, optional - Tag to be selected. If None, any tag is selected. - Example -------- >>>
❌ @@ -83,27 +103,16 @@ class OnlyChild(CSSSoupSelector): Tag is only selected if it is the only child of its parent. - In case of passing tag parameter, selector is `{tag}:only-child`. - Otherwise, selector is `:only-child`, which is equal to passing "*", - css wildcard selector as an argument. - Example -------- >>> OnlyChild().selector :only-child - >>> OnlyChild("li").selector - li:only-child - - If tag is specified, two conditions must be met: - - Tag is the only child of its parent - - Tag has the specified tag name For more information on the :only-child selector, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:only-child """ - def __init__(self, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:only-child") + _SELECTOR = ":only-child" class Empty(CSSSoupSelector): @@ -111,11 +120,6 @@ class Empty(CSSSoupSelector): Class to select tags that are empty, i.e., have no children. It uses the CSS selector `:empty`. - Parameters - ---------- - tag : str, optional - Tag to be selected. If None, any tag is selected. - Example -------- >>>
❌ @@ -124,20 +128,10 @@ class Empty(CSSSoupSelector): Tag is only selected if it is empty. - In case of passing tag parameter, selector is `{tag}:empty`. - Otherwise, selector is `:empty`, which is equal to passing "*", - css wildcard selector as an argument. - Example -------- >>> Empty().selector :empty - >>> Empty("ol").selector - ol:empty - - If tag is specified, two conditions must be met: - - Tag is empty - - Tag has the specified tag name Notes -------- @@ -147,7 +141,7 @@ class Empty(CSSSoupSelector): Example -------- >>>
Hello World
❌ - >>>
❌ + ...
❌ These tags are not empty and do not match the selector. @@ -155,8 +149,7 @@ class Empty(CSSSoupSelector): https://developer.mozilla.org/en-US/docs/Web/CSS/:empty """ - def __init__(self, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:empty") + _SELECTOR = ":empty" class FirstChild(CSSSoupSelector): @@ -164,11 +157,6 @@ class FirstChild(CSSSoupSelector): Class to select tags that are the first child of their parent. It uses the CSS selector `:first-child`. - Parameters - ---------- - tag : str, optional - Tag to be selected. If None, any tag is selected. - Example -------- >>>
✔️ @@ -181,20 +169,10 @@ class FirstChild(CSSSoupSelector): Tag is only selected if it is the first child of its parent. - In case of passing tag parameter, selector is `{tag}:first-child`. - Otherwise, selector is `:first-child`, which is equal to passing "*", - css wildcard selector as an argument. - Example -------- >>> FirstChild().selector :first-child - >>> FirstChild("li").selector - li:first-child - - If tag is specified, two conditions must be met: - - Tag is the first child of its parent - - Tag has the specified tag name Notes -------- @@ -204,8 +182,7 @@ class FirstChild(CSSSoupSelector): https://developer.mozilla.org/en-US/docs/Web/CSS/:first-child """ - def __init__(self, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:first-child") + _SELECTOR = ":first-child" class LastChild(CSSSoupSelector): @@ -213,11 +190,6 @@ class LastChild(CSSSoupSelector): Class to select tags that are the last child of their parent. It uses the CSS selector `:last-child`. - Parameters - ---------- - tag : str, optional - Tag to be selected. If None, any tag is selected. - Example -------- >>>
❌ @@ -231,20 +203,10 @@ class LastChild(CSSSoupSelector): Tag is only selected if it is the last child of its parent. Element that is the first and only child is matched as well. - In case of passing tag parameter, selector is `{tag}:last-child`. - Otherwise, selector is `:last-child`, which is equal to passing "*", - css wildcard selector as an argument. - Example -------- >>> LastChild().selector :last-child - >>> LastChild("div").selector - div:last-child - - If tag is specified, two conditions must be met: - - Tag is the last child of its parent - - Tag has the specified tag name Notes -------- @@ -254,11 +216,22 @@ class LastChild(CSSSoupSelector): https://developer.mozilla.org/en-US/docs/Web/CSS/:last-child """ - def __init__(self, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:last-child") + _SELECTOR = ":last-child" + + +class _NthBaseSelector(CSSSoupSelector): + """Base class for selectors based on nth formula.""" + + def __init__(self, nth: str) -> None: + self._nth = nth + super().__init__() + + @property + def _formats(self) -> list[str]: + return [self._nth] -class NthChild(CSSSoupSelector): +class NthChild(_NthBaseSelector): """ Class to select tags that are the nth child of their parent. It uses the CSS selector `:nth-child(n)`. @@ -267,37 +240,24 @@ class NthChild(CSSSoupSelector): ---------- nth : str, positional Number of the child to be selected. Can be a number or a formula. - tag : str, optional - Tag to be selected. If None, any tag is selected. - - In case of passing tag parameter, selector is `{tag}:nth-child(n)`. - Otherwise, selector is `:nth-child(n)`, which is equal to passing "*", - css wildcard selector as an argument. Example -------- >>> NthChild("2").selector :nth-child(2) - >>> NthChild("2", "li").selector - li:nth-child(2) >>> NthChild("2n+1").selector :nth-child(2n+1) - >>> NthChild("odd", "div").selector - div:nth-child(odd) - - If tag is specified, two conditions must be met: - - Tag is the nth child of its parent - - Tag has the specified tag name + >>> NthChild("odd").selector + :nth-child(odd) For more information on the formula, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-child """ - def __init__(self, nth: str, /, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:nth-child({nth})") + _SELECTOR = ":nth-child({})" -class NthLastChild(CSSSoupSelector): +class NthLastChild(_NthBaseSelector): """ Class to select tags that are the nth last child of their parent. It uses the CSS selector `:nth-last-child(n)`. @@ -306,34 +266,21 @@ class NthLastChild(CSSSoupSelector): ---------- nth : str, positional Number of the child to be selected. Can be a number or a formula. - tag : str, optional - Tag to be selected. If None, any tag is selected. - - In case of passing tag parameter, selector is `{tag}:nth-last-child(n)`. - Otherwise, selector is `:nth-last-child(n)`, which is equal to passing "*", - css wildcard selector as an argument. Example -------- >>> NthLastChild("2").selector :nth-last-child(2) - >>> NthLastChild("2", "li").selector - li:nth-last-child(2) >>> NthLastChild("2n+1").selector :nth-last-child(2n+1) - >>> NthLastChild("odd", "div").selector - div:nth-last-child(odd) - - If tag is specified, two conditions must be met: - - Tag is the nth last child of its parent - - Tag has the specified tag name + >>> NthLastChild("odd").selector + :nth-last-child(odd) For more information on the formula, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-last-child """ - def __init__(self, nth: str, /, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:nth-last-child({nth})") + _SELECTOR = ":nth-last-child({})" class FirstOfType(CSSSoupSelector): @@ -341,11 +288,6 @@ class FirstOfType(CSSSoupSelector): Class to select tags that are the first of their type in their parent. It uses the CSS selector `:first-of-type`. - Parameters - ---------- - tag : str, optional - Tag to be selected. If None, any tag is selected. - Example -------- >>>
✔️ @@ -359,32 +301,21 @@ class FirstOfType(CSSSoupSelector): Tag is only selected if it is the first of its type in its parent. - In case of passing tag parameter, selector is `{tag}:first-of-type`. - Otherwise, selector is `:first-of-type`, which is equal to passing "*", - css wildcard selector as an argument. - Example -------- >>> FirstOfType().selector :first-of-type - >>> FirstOfType("div").selector - div:first-of-type - - If tag is specified, two conditions must be met: - - Tag is the first of its type in its parent - - Tag has the specified tag name Notes -------- - If tag is not specified, the first tag of any type is selected, which in + For this selector the first tag of any type is selected, which in case of finding single tag is equivalent to FirstChild() results. For more information on the formula, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:first-of-type """ - def __init__(self, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:first-of-type") + _SELECTOR = ":first-of-type" class LastOfType(CSSSoupSelector): @@ -392,11 +323,6 @@ class LastOfType(CSSSoupSelector): Class to select tags that are the last of their type in their parent. It uses the CSS selector `:last-of-type`. - Parameters - ---------- - tag : str, optional - Tag to be selected. If None, any tag is selected. - Example -------- >>>
❌ @@ -411,35 +337,24 @@ class LastOfType(CSSSoupSelector): Tag is only selected if it is the last of its type in its parent. Element that is the first and only child is matched as well. - In case of passing tag parameter, selector is `{tag}:last-of-type`. - Otherwise, selector is `:last-of-type`, which is equal to passing "*", - css wildcard selector as an argument. - Example -------- >>> LastOfType().selector :last-of-type - >>> LastOfType("div").selector - div:last-of-type - - If tag is specified, two conditions must be met: - - Tag is the last of its type in its parent - - Tag has the specified tag name Notes -------- - If tag is not specified, the first tag of any type is selected, which in + For this selector the last tag of any type is selected, which in case of finding single tag is the equivalent to LastChild() results. For more information on the formula, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:last-of-type """ - def __init__(self, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:last-of-type") + _SELECTOR = ":last-of-type" -class NthOfType(CSSSoupSelector): +class NthOfType(_NthBaseSelector): """ Class to select tags that are the nth of their type in their parent. It uses the CSS selector `:nth-of-type(n)`. @@ -448,37 +363,24 @@ class NthOfType(CSSSoupSelector): ---------- nth : str, positional Number of the tag to be selected. Can be a number or a formula. - tag : str, optional - Tag to be selected. If None, any tag is selected. - - In case of passing tag parameter, selector is `{tag}:nth-of-type(n)`. - Otherwise, selector is `:nth-of-type(n)`, which is equal to passing "*", - css wildcard selector as an argument. Example -------- >>> NthOfType("2").selector :nth-of-type(2) - >>> NthOfType("2", "li").selector - li:nth-of-type(2) >>> NthOfType("2n+1").selector :nth-of-type(2n+1) - >>> NthOfType("even", "div").selector - div:nth-of-type(even) - - If tag is specified, two conditions must be met: - - Tag is the nth of its type in its parent - - Tag has the specified tag name + >>> NthOfType("even").selector + :nth-of-type(even) For more information on the formula, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-of-type """ - def __init__(self, nth: str, /, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:nth-of-type({nth})") + _SELECTOR = ":nth-of-type({})" -class NthLastOfType(CSSSoupSelector): +class NthLastOfType(_NthBaseSelector): """ Class to select tags that are the nth last of their type in their parent. It uses the CSS selector `:nth-last-of-type(n)`. @@ -487,34 +389,21 @@ class NthLastOfType(CSSSoupSelector): ---------- nth : str, positional Number of the tag to be selected. Can be a number or a formula. - tag : str, optional - Tag to be selected. If None, any tag is selected. - - In case of passing tag parameter, selector is `{tag}:nth-last-of-type(n)`. - Otherwise, selector is `:nth-last-of-type(n)`, which is equal to passing "*", - css wildcard selector as an argument. Example -------- >>> NthLastOfType("2").selector :nth-last-of-type(2) - >>> NthLastOfType("2", "li").selector - li:nth-last-of-type(2) >>> NthLastOfType("2n+1").selector :nth-last-of-type(2n+1) - >>> NthLastOfType("even", "div").selector - div:nth-last-of-type(even) - - If tag is specified, two conditions must be met: - - Tag is the nth last of its type in its parent - - Tag has the specified tag name + >>> NthLastOfType("even").selector + :nth-last-of-type(even) For more information on the formula, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-last-of-type """ - def __init__(self, nth: str, /, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:nth-last-of-type({nth})") + _SELECTOR = ":nth-last-of-type({})" class OnlyOfType(CSSSoupSelector): @@ -522,11 +411,6 @@ class OnlyOfType(CSSSoupSelector): Class to select tags that are the only of their type in their parent. It uses the CSS selector `:only-of-type`. - Parameters - ---------- - tag : str, optional - Tag to be selected. If None, any tag is selected. - Example -------- >>>
❌ @@ -541,27 +425,16 @@ class OnlyOfType(CSSSoupSelector): Tag is only selected if it is the only tag of its type in its parent. - In case of passing tag parameter, selector is `{tag}:only-of-type`. - Otherwise, selector is `:only-of-type`, which is equal to passing "*", - css wildcard selector as an argument. - Example -------- >>> OnlyOfType().selector :only-of-type - >>> OnlyOfType("div").selector - div:only-of-type - - If tag is specified, two conditions must be met: - - Tag is the only tag of its type in its parent - - Tag has the specified tag name For more information on the :only-of-type selector, see: https://developer.mozilla.org/en-US/docs/Web/CSS/:only-of-type """ - def __init__(self, tag: Optional[str] = None) -> None: - super().__init__(f"{tag or ''}:only-of-type") + _SELECTOR = ":only-of-type" class CSS(CSSSoupSelector): @@ -594,5 +467,12 @@ class CSS(CSSSoupSelector): ... ✔️ """ + _SELECTOR = "{}" + def __init__(self, css: str) -> None: - super().__init__(css) + self._css = css + super().__init__() + + @property + def _formats(self) -> list[str]: + return [self._css] diff --git a/soupsavvy/testing/__init__.py b/soupsavvy/testing/__init__.py index e69de29b..41f63ccd 100644 --- a/soupsavvy/testing/__init__.py +++ b/soupsavvy/testing/__init__.py @@ -0,0 +1,11 @@ +from .generators import AttributeGenerator, TagGenerator +from .generators.templates import ChoiceTemplate, RandomTemplate +from .generators.templates.base import BaseTemplate + +__all__ = [ + "AttributeGenerator", + "TagGenerator", + "ChoiceTemplate", + "RandomTemplate", + "BaseTemplate", +] diff --git a/soupsavvy/testing/generators/generators.py b/soupsavvy/testing/generators/generators.py index 828efb48..5f5ece63 100644 --- a/soupsavvy/testing/generators/generators.py +++ b/soupsavvy/testing/generators/generators.py @@ -117,7 +117,6 @@ def __init__(self, name: str, value: TemplateType = None) -> None: value : TemplateType, optional The value of the attribute. Defaults to None. """ - self._check_name(name) self.name = name self.value = _get_template_type( @@ -276,11 +275,14 @@ def __init__( The text content of the tag. Defaults to None, which generates empty string. """ + if isinstance(attrs, str): + raise TypeError("'attrs' must be an iterable of attributes, not a string") + self._void = name in namespace.VOID_TAGS if self._void and children: raise exc.VoidTagWithChildrenException( - f"{name} is a void tag and cannot have children" + f"'{name}' is a void tag and cannot have children" ) self._check_name(name) @@ -291,6 +293,7 @@ def __init__( param="text", default=DEFAULT_TEXT_TEMPLATE, ) + self.attributes = self._process_attributes(attrs) self.children = [ child if isinstance(child, TagGenerator) else TagGenerator(child) @@ -356,7 +359,7 @@ def _process_attributes( except AttributeGeneratorInitExceptions as e: raise exc.AttributeParsingError( f"Attribute {attr} could not be parsed into AttributeGenerator " - "due to following error." + "due to following error:" ) from e attr_name = attr.name diff --git a/tests/soupsavvy/selectors/css/selectors/css_wrapper_test.py b/tests/soupsavvy/selectors/css/selectors/css_wrapper_test.py index 388f4681..b6088bb6 100644 --- a/tests/soupsavvy/selectors/css/selectors/css_wrapper_test.py +++ b/tests/soupsavvy/selectors/css/selectors/css_wrapper_test.py @@ -55,7 +55,7 @@ def test_find_returns_first_tag_matching_selector(self):

3

""" - bs = to_bs(text) + bs = find_body_element(to_bs(text)) selector = CSS("div.widget") result = selector.find(bs) assert strip(str(result)) == strip("""
1
""") @@ -74,7 +74,7 @@ def test_find_returns_none_if_no_match_and_strict_false(self): """ - bs = to_bs(text) + bs = find_body_element(to_bs(text)) selector = CSS("div.widget") result = selector.find(bs) assert result is None @@ -93,7 +93,7 @@ def test_find_raises_exception_if_no_match_and_strict_true(self): """ - bs = to_bs(text) + bs = find_body_element(to_bs(text)) selector = CSS("div.widget") with pytest.raises(TagNotFoundException): @@ -112,7 +112,7 @@ def test_find_all_returns_all_matching_elements(self):

3

""" - bs = to_bs(text) + bs = find_body_element(to_bs(text)) selector = CSS("div.widget") result = selector.find_all(bs) @@ -133,7 +133,7 @@ def test_find_all_returns_empty_list_when_no_match(self): """ - bs = to_bs(text) + bs = find_body_element(to_bs(text)) selector = CSS("div.widget") result = selector.find_all(bs) assert result == [] diff --git a/tests/soupsavvy/selectors/css/selectors/empty_test.py b/tests/soupsavvy/selectors/css/selectors/empty_test.py index e29d2ae7..32276172 100644 --- a/tests/soupsavvy/selectors/css/selectors/empty_test.py +++ b/tests/soupsavvy/selectors/css/selectors/empty_test.py @@ -12,14 +12,10 @@ class TestEmpty: """Class with unit tests for Empty tag selector.""" - def test_selector_is_correct_without_tag(self): - """Tests if selector property returns correct value without specifying tag.""" + def test_css_selector_is_correct(self): + """Tests if css property returns correct value.""" assert Empty().css == ":empty" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert Empty("div").css == "div:empty" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -45,27 +41,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""
"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
Hello
-
-
-

text 1

-

-
-
- Hello -
- """ - bs = find_body_element(to_bs(text)) - selector = Empty("div") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""
"""), - strip("""
"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ diff --git a/tests/soupsavvy/selectors/css/selectors/first_child_test.py b/tests/soupsavvy/selectors/css/selectors/first_child_test.py index 3b3fe2fd..8f2cbca5 100644 --- a/tests/soupsavvy/selectors/css/selectors/first_child_test.py +++ b/tests/soupsavvy/selectors/css/selectors/first_child_test.py @@ -12,14 +12,10 @@ class TestFirstChild: """Class with unit tests for FirstChild tag selector.""" - def test_selector_is_correct_without_tag(self): - """Tests if selector property returns correct value without specifying tag.""" + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert FirstChild().css == ":first-child" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert FirstChild("div").css == "div:first-child" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -40,104 +36,22 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""3"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
1
-
- - -
-
-

Hello

- -
3
- -
-
- - """ - bs = find_body_element(to_bs(text)) - selector = FirstChild("div") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""
1
"""), - strip(""""""), - strip("""
3
"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ -
Hello
-
- 1 -

- Hello -
- 2 - - - """ - bs = to_bs(text) - selector = FirstChild("a") - result = selector.find(bs) - assert strip(str(result)) == strip("""1""") - - def test_find_returns_none_if_no_match_and_strict_false(self): - """ - Tests if find returns None if no element matches the selector - and strict is False. - """ - text = """ -
+
1
+

23

Hello
-

Hello

+ 3

""" - bs = to_bs(text) - selector = FirstChild("a") + bs = find_body_element(to_bs(text)) + selector = FirstChild() result = selector.find(bs) - assert result is None - - def test_find_raises_exception_if_no_match_and_strict_true(self): - """ - Tests if find raises TagNotFoundException if no element matches the selector - and strict is True. - """ - text = """ -

-
-

- Hello -
-

Hello

- - """ - bs = to_bs(text) - selector = FirstChild("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True) - - def test_find_all_returns_empty_list_when_no_match(self): - """Tests if find returns an empty list if no element matches the selector.""" - text = """ -
-
-

- Hello -
-

Hello

- - """ - bs = to_bs(text) - selector = FirstChild("a") - result = selector.find_all(bs) - assert result == [] + assert strip(str(result)) == strip("""
1
""") def test_find_returns_first_matching_child_if_recursive_false(self): """ @@ -154,70 +68,10 @@ def test_find_returns_first_matching_child_if_recursive_false(self): Not child """ bs = find_body_element(to_bs(text)) - selector = FirstChild("a") + selector = FirstChild() result = selector.find(bs, recursive=False) assert strip(str(result)) == strip("""1""") - def test_find_returns_none_if_recursive_false_and_no_matching_child(self): - """ - Tests if find returns None if no child element matches the selector - and recursive is False. - """ - text = """ -
-
- Not child -

- Hello -
- Not child - """ - bs = find_body_element(to_bs(text)) - selector = FirstChild("a") - result = selector.find(bs, recursive=False) - assert result is None - - def test_find_raises_exception_with_recursive_false_and_strict_mode(self): - """ - Tests if find raises TagNotFoundException if no child element - matches the selector, when recursive is False and strict is True. - """ - text = """ -
-
- Not child -

- Hello -
- Not child - """ - bs = find_body_element(to_bs(text)) - selector = FirstChild("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True, recursive=False) - - def test_find_all_returns_empty_list_if_none_matching_children_when_recursive_false( - self, - ): - """ - Tests if find_all returns an empty list if no child element matches the selector - and recursive is False. - """ - text = """ -
-
- Not child -

- Hello -
- Not child - """ - bs = find_body_element(to_bs(text)) - selector = FirstChild("a") - result = selector.find_all(bs, recursive=False) - assert result == [] - def test_find_all_returns_all_matching_children_when_recursive_false(self): """ Tests if find_all returns all matching children if recursive is False. @@ -245,21 +99,20 @@ def test_find_all_returns_only_x_elements_when_limit_is_set(self): In this case only 2 first in order elements are returned. """ text = """ -
Hello
+
1
- 1 + 2

Hello
- 2 + 3 - """ bs = find_body_element(to_bs(text)) - selector = FirstChild("a") + selector = FirstChild() result = selector.find_all(bs, limit=2) assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""1"""), + strip("""
1
"""), strip("""2"""), ] diff --git a/tests/soupsavvy/selectors/css/selectors/first_of_type_test.py b/tests/soupsavvy/selectors/css/selectors/first_of_type_test.py index a9ecdf4b..23337a38 100644 --- a/tests/soupsavvy/selectors/css/selectors/first_of_type_test.py +++ b/tests/soupsavvy/selectors/css/selectors/first_of_type_test.py @@ -12,14 +12,10 @@ class TestFirstOfType: """Class with unit tests for FirstOfType tag selector.""" - def test_selector_is_correct_without_tag(self): - """Tests if selector property returns correct value without specifying tag.""" + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert FirstOfType().css == ":first-of-type" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert FirstOfType("div").css == "div:first-of-type" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -46,167 +42,37 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""56"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
Hello
-
- 1 -
-

text

- 2 -
- Hello - 3 - Hello -
-

- Hello -
- - """ - bs = find_body_element(to_bs(text)) - selector = FirstOfType("a") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""1"""), - strip("""2"""), - strip("""3"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ -
+
1
Hello
- 1 - 2 - Hello + 2 + 3 + 4 Hello - - """ - bs = to_bs(text) - selector = FirstOfType("a") - result = selector.find(bs) - assert strip(str(result)) == strip("""1""") - - def test_find_returns_none_if_no_match_and_strict_false(self): - """ - Tests if find returns None if no element matches the selector - and strict is False. """ - text = """ -
-
Hello
-

Not child

- Hello - """ - bs = to_bs(text) - selector = FirstOfType("a") + bs = find_body_element(to_bs(text)) + selector = FirstOfType() result = selector.find(bs) - assert result is None - - def test_find_raises_exception_if_no_match_and_strict_true(self): - """ - Tests if find raises TagNotFoundException if no element matches the selector - and strict is True. - """ - text = """ -
-
Hello
-

Not child

- Hello - """ - bs = to_bs(text) - selector = FirstOfType("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True) - - def test_find_all_returns_empty_list_when_no_match(self): - """Tests if find returns an empty list if no element matches the selector.""" - text = """ -
-
Hello
-

Not child

- Hello - """ - bs = to_bs(text) - selector = FirstOfType("a") - result = selector.find_all(bs) - assert result == [] + assert strip(str(result)) == strip("""
1
""") def test_find_returns_first_matching_child_if_recursive_false(self): """ Tests if find returns first matching child element if recursive is False. """ text = """ -
-
Hello
+ 1 Not child - 1 - Hello - Hello -

- """ - bs = find_body_element(to_bs(text)) - selector = FirstOfType("a") - result = selector.find(bs, recursive=False) - assert strip(str(result)) == strip("""1""") - - def test_find_returns_none_if_recursive_false_and_no_matching_child(self): - """ - Tests if find returns None if no child element matches the selector - and recursive is False. - """ - text = """ -
+
2
Hello
- Not child - Hello -

+ 3 +

Not child

""" bs = find_body_element(to_bs(text)) - selector = FirstOfType("a") + selector = FirstOfType() result = selector.find(bs, recursive=False) - assert result is None - - def test_find_raises_exception_with_recursive_false_and_strict_mode(self): - """ - Tests if find raises TagNotFoundException if no child element - matches the selector, when recursive is False and strict is True. - """ - text = """ -
-
Hello
- Not child - Hello -

- """ - bs = find_body_element(to_bs(text)) - selector = FirstOfType("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True, recursive=False) - - def test_find_all_returns_empty_list_if_none_matching_children_when_recursive_false( - self, - ): - """ - Tests if find_all returns an empty list if no child element matches the selector - and recursive is False. - """ - text = """ -
-
Hello
- Not child - Hello -

- """ - bs = find_body_element(to_bs(text)) - selector = FirstOfType("a") - result = selector.find_all(bs, recursive=False) - assert result == [] + assert strip(str(result)) == strip("""1""") def test_find_all_returns_all_matching_children_when_recursive_false(self): """ diff --git a/tests/soupsavvy/selectors/css/selectors/last_child_test.py b/tests/soupsavvy/selectors/css/selectors/last_child_test.py index 20b77e07..e40ea847 100644 --- a/tests/soupsavvy/selectors/css/selectors/last_child_test.py +++ b/tests/soupsavvy/selectors/css/selectors/last_child_test.py @@ -12,14 +12,10 @@ class TestLastChild: """Class with unit tests for LastChild tag selector.""" - def test_selector_is_correct_without_tag(self): - """Tests if selector property returns correct value without specifying tag.""" + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert LastChild().css == ":last-child" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert LastChild("div").css == "div:last-child" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -40,34 +36,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""34"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
-
-
1
- -
-
-

Hello

- - -
3
-
-
- - - """ - bs = find_body_element(to_bs(text)) - selector = LastChild("div") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""
1
"""), - strip(""""""), - strip("""
3
"""), - strip(""""""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ @@ -78,69 +46,14 @@ def test_find_returns_first_tag_matching_selector(self): 1
2 - Hello - + + 4 """ - bs = to_bs(text) - selector = LastChild("a") + bs = find_body_element(to_bs(text)) + selector = LastChild() result = selector.find(bs) assert strip(str(result)) == strip("""1""") - def test_find_returns_none_if_no_match_and_strict_false(self): - """ - Tests if find returns None if no element matches the selector - and strict is False. - """ - text = """ -
-
- Hello -

-
- -

Hello

- """ - bs = to_bs(text) - selector = LastChild("a") - result = selector.find(bs) - assert result is None - - def test_find_raises_exception_if_no_match_and_strict_true(self): - """ - Tests if find raises TagNotFoundException if no element matches the selector - and strict is True. - """ - text = """ -
-
- Hello -

-
- -

Hello

- """ - bs = to_bs(text) - selector = LastChild("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True) - - def test_find_all_returns_empty_list_when_no_match(self): - """Tests if find returns an empty list if no element matches the selector.""" - text = """ -
-
- Hello -

-
- -

Hello

- """ - bs = to_bs(text) - selector = LastChild("a") - result = selector.find_all(bs) - assert result == [] - def test_find_returns_first_matching_child_if_recursive_false(self): """ Tests if find returns first matching child element if recursive is False. @@ -157,73 +70,10 @@ def test_find_returns_first_matching_child_if_recursive_false(self): 1 """ bs = find_body_element(to_bs(text)) - selector = LastChild("a") + selector = LastChild() result = selector.find(bs, recursive=False) assert strip(str(result)) == strip("""1""") - def test_find_returns_none_if_recursive_false_and_no_matching_child(self): - """ - Tests if find returns None if no child element matches the selector - and recursive is False. - """ - text = """ -
Hello
-
- Hello -

- Not child -
- Hello - Not child - """ - bs = find_body_element(to_bs(text)) - selector = LastChild("a") - result = selector.find(bs, recursive=False) - assert result is None - - def test_find_raises_exception_with_recursive_false_and_strict_mode(self): - """ - Tests if find raises TagNotFoundException if no child element - matches the selector, when recursive is False and strict is True. - """ - text = """ -
Hello
-
- Hello -

- Not child -
- Hello - Not child - """ - bs = find_body_element(to_bs(text)) - selector = LastChild("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True, recursive=False) - - def test_find_all_returns_empty_list_if_none_matching_children_when_recursive_false( - self, - ): - """ - Tests if find_all returns an empty list if no child element matches the selector - and recursive is False. - """ - text = """ -
Hello
-
- Hello -

- Not child -
- Hello - Not child - """ - bs = find_body_element(to_bs(text)) - selector = LastChild("a") - result = selector.find_all(bs, recursive=False) - assert result == [] - def test_find_all_returns_all_matching_children_when_recursive_false(self): """ Tests if find_all returns all matching children if recursive is False. @@ -259,11 +109,11 @@ def test_find_all_returns_only_x_elements_when_limit_is_set(self): 1
2 - Hello - + + 4 """ bs = find_body_element(to_bs(text)) - selector = LastChild("a") + selector = LastChild() result = selector.find_all(bs, limit=2) assert list(map(lambda x: strip(str(x)), result)) == [ diff --git a/tests/soupsavvy/selectors/css/selectors/last_of_type_test.py b/tests/soupsavvy/selectors/css/selectors/last_of_type_test.py index c1ba9601..950dc6f4 100644 --- a/tests/soupsavvy/selectors/css/selectors/last_of_type_test.py +++ b/tests/soupsavvy/selectors/css/selectors/last_of_type_test.py @@ -12,14 +12,10 @@ class TestLastOfType: """Class with unit tests for LastOfType tag selector.""" - def test_selector_is_correct_without_tag(self): - """Tests if selector property returns correct value without specifying tag.""" + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert LastOfType().css == ":last-of-type" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert LastOfType("div").css == "div:last-of-type" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -46,34 +42,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""56"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
Hello
- -
-

text

-
- Hello - - 1 -
-

- Hello - 2 -
-
- 3 - """ - bs = find_body_element(to_bs(text)) - selector = LastOfType("a") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""1"""), - strip("""2"""), - strip("""3"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ @@ -81,132 +49,32 @@ def test_find_returns_first_tag_matching_selector(self): Hello
Hello
1 - Hello - 2 - + 2 + 3 +

45

""" - bs = to_bs(text) - selector = LastOfType("a") + bs = find_body_element(to_bs(text)) + selector = LastOfType() result = selector.find(bs) assert strip(str(result)) == strip("""1""") - def test_find_returns_none_if_no_match_and_strict_false(self): - """ - Tests if find returns None if no element matches the selector - and strict is False. - """ - text = """ -
-
Hello
-

- Hello - """ - bs = to_bs(text) - selector = LastOfType("a") - result = selector.find(bs) - assert result is None - - def test_find_raises_exception_if_no_match_and_strict_true(self): - """ - Tests if find raises TagNotFoundException if no element matches the selector - and strict is True. - """ - text = """ -
-
Hello
-

- Hello - """ - bs = to_bs(text) - selector = LastOfType("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True) - - def test_find_all_returns_empty_list_when_no_match(self): - """Tests if find returns an empty list if no element matches the selector.""" - text = """ -
-
Hello
-

- Hello - """ - bs = to_bs(text) - selector = LastOfType("a") - result = selector.find_all(bs) - assert result == [] - def test_find_returns_first_matching_child_if_recursive_false(self): """ Tests if find returns first matching child element if recursive is False. """ text = """
-
Hello
- Not child Hello - Hello - 1 -

- """ - bs = find_body_element(to_bs(text)) - selector = LastOfType("a") - result = selector.find(bs, recursive=False) - assert strip(str(result)) == strip("""1""") - - def test_find_returns_none_if_recursive_false_and_no_matching_child(self): - """ - Tests if find returns None if no child element matches the selector - and recursive is False. - """ - text = """ -
Hello
Not child - Hello -

+ 1 + 2 +

3

""" bs = find_body_element(to_bs(text)) - selector = LastOfType("a") + selector = LastOfType() result = selector.find(bs, recursive=False) - assert result is None - - def test_find_raises_exception_with_recursive_false_and_strict_mode(self): - """ - Tests if find raises TagNotFoundException if no child element - matches the selector, when recursive is False and strict is True. - """ - text = """ -
-
Hello
- Not child - Hello -

- """ - bs = find_body_element(to_bs(text)) - selector = LastOfType("a") - - with pytest.raises(TagNotFoundException): - selector.find(bs, strict=True, recursive=False) - - def test_find_all_returns_empty_list_if_none_matching_children_when_recursive_false( - self, - ): - """ - Tests if find_all returns an empty list if no child element matches the selector - and recursive is False. - """ - text = """ -
-
Hello
- Not child - Hello -

- """ - bs = find_body_element(to_bs(text)) - selector = LastOfType("a") - result = selector.find_all(bs, recursive=False) - assert result == [] + assert strip(str(result)) == strip("""1""") def test_find_all_returns_all_matching_children_when_recursive_false(self): """ diff --git a/tests/soupsavvy/selectors/css/selectors/nth_child_test.py b/tests/soupsavvy/selectors/css/selectors/nth_child_test.py index 58e49712..1898ac7e 100644 --- a/tests/soupsavvy/selectors/css/selectors/nth_child_test.py +++ b/tests/soupsavvy/selectors/css/selectors/nth_child_test.py @@ -12,16 +12,10 @@ class TestNthChild: """Class with unit tests for NthChild tag selector.""" - def test_selector_is_correct_without_tag(self): - """ - Tests if selector property returns correct value without specifying tag. - """ + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert NthChild("2n").css == ":nth-child(2n)" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert NthChild("2n", tag="div").css == "div:nth-child(2n)" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -46,30 +40,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""
5
"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
- Not p -
-

Hello

-

1

- Hello -

2

-
-
Not p
- Hello -

3

- """ - bs = find_body_element(to_bs(text)) - selector = NthChild("2n", tag="p") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""

1

"""), - strip("""

2

"""), - strip("""

3

"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ @@ -281,47 +251,6 @@ def test_init_raise_exception_with_invalid_selector(self): with pytest.raises(InvalidCSSSelector): NthChild("2x+1") - @pytest.mark.parametrize( - argnames="nth, expected", - argvalues=[ - ("2n", [2, 4, 6]), - ("2n+1", [1, 3, 5]), - # ignores whitespaces - (" 2n + 1", [1, 3, 5]), - ("-n+3", [1, 2, 3]), - ("even", [2, 4, 6]), - ("odd", [1, 3, 5]), - ("3", [3]), - ("-3n", []), - ("-3n+10", [1, 4]), - ], - ) - def test_returns_elements_based_on_nth_selector_and_tag( - self, nth: str, expected: list[int] - ): - """ - Tests if find_all returns all elements with specified tag name - matching various nth selectors. - """ - text = """ -
1
-
2
-
3
-
4
-
5
-
6
- Hello - Hello -

- """ - bs = find_body_element(to_bs(text)) - selector = NthChild(nth, tag="div") - results = selector.find_all(bs) - - assert list(map(lambda x: strip(str(x)), results)) == [ - f"""
{i}
""" for i in expected - ] - @pytest.mark.parametrize( argnames="nth, expected", argvalues=[ diff --git a/tests/soupsavvy/selectors/css/selectors/nth_last_child_test.py b/tests/soupsavvy/selectors/css/selectors/nth_last_child_test.py index d7053ae9..1d610df3 100644 --- a/tests/soupsavvy/selectors/css/selectors/nth_last_child_test.py +++ b/tests/soupsavvy/selectors/css/selectors/nth_last_child_test.py @@ -12,16 +12,10 @@ class TestNthLastChild: """Class with unit tests for NthLastChild tag selector.""" - def test_selector_is_correct_without_tag(self): - """ - Tests if selector property returns correct value without specifying tag. - """ + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert NthLastChild("2n").css == ":nth-last-child(2n)" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert NthLastChild("2n", tag="div").css == "div:nth-last-child(2n)" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -46,31 +40,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""
5
"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ - Not p -
-

1

- Hello -

2Hello

-

3

-
-

4

-

Not nth

-
Not p
- - """ - bs = find_body_element(to_bs(text)) - selector = NthLastChild("2n", tag="p") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""

1

"""), - strip("""

2Hello

"""), - strip("""

3

"""), - strip("""

4

"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ @@ -282,47 +251,6 @@ def test_init_raise_exception_with_invalid_selector(self): with pytest.raises(InvalidCSSSelector): NthLastChild("2x+1") - @pytest.mark.parametrize( - argnames="nth, expected", - argvalues=[ - ("2n", [1, 3, 5]), - ("2n+1", [2, 4, 6]), - # ignores whitespaces - (" 2n + 1", [2, 4, 6]), - ("-n+3", [4, 5, 6]), - ("even", [1, 3, 5]), - ("odd", [2, 4, 6]), - ("3", [4]), - ("-3n", []), - ("-3n+10", [3, 6]), - ], - ) - def test_returns_elements_based_on_nth_selector_and_tag( - self, nth: str, expected: list[int] - ): - """ - Tests if find_all returns all elements with specified tag name - matching various nth selectors. - """ - text = """ - Hello - Hello -

-
1
-
2
-
3
-
4
-
5
-
6
- """ - bs = find_body_element(to_bs(text)) - selector = NthLastChild(nth, tag="div") - results = selector.find_all(bs) - - assert list(map(lambda x: strip(str(x)), results)) == [ - f"""
{i}
""" for i in expected - ] - @pytest.mark.parametrize( argnames="nth, expected", argvalues=[ diff --git a/tests/soupsavvy/selectors/css/selectors/nth_last_of_type_test.py b/tests/soupsavvy/selectors/css/selectors/nth_last_of_type_test.py index 61eab3fc..c880d877 100644 --- a/tests/soupsavvy/selectors/css/selectors/nth_last_of_type_test.py +++ b/tests/soupsavvy/selectors/css/selectors/nth_last_of_type_test.py @@ -14,16 +14,10 @@ class TestNthLastOfType: """Class with unit tests for NthLastOfType tag selector.""" - def test_selector_is_correct_without_tag(self): - """ - Tests if selector property returns correct value without specifying tag. - """ + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert NthLastOfType("2n").css == ":nth-last-of-type(2n)" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert NthLastOfType("2n", tag="div").css == "div:nth-last-of-type(2n)" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -51,31 +45,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""5"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
- 1 -
Not a
- Hello -
-

-

Hello

- Hello - 2 -
-

3

- Hello - """ - bs = find_body_element(to_bs(text)) - selector = NthLastOfType("2n", tag="a") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""1"""), - strip("""2"""), - strip("""

3

"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ @@ -322,47 +291,6 @@ def test_init_raise_exception_with_invalid_selector(self): with pytest.raises(InvalidCSSSelector): NthLastOfType("2x+1") - @pytest.mark.parametrize( - argnames="nth, expected", - argvalues=[ - ("2n", [1, 3, 5]), - ("2n+1", [2, 4, 6]), - # ignores whitespaces - (" 2n + 1", [2, 4, 6]), - ("-n+3", [4, 5, 6]), - ("even", [1, 3, 5]), - ("odd", [2, 4, 6]), - ("3", [4]), - ("-3n", []), - ("-3n+10", [3, 6]), - ], - ) - def test_returns_elements_based_on_nth_selector_and_tag( - self, nth: str, expected: list[int] - ): - """ - Tests if find_all returns all elements with specified tag name - matching various nth selectors. - """ - text = """ -

text 1

- Hello -

text 2

-

text 3

-
Hello
- Hello -

text 4

-

Hello world

-

text 5

-

text 6

- """ - bs = find_body_element(to_bs(text)) - selector = NthLastOfType(nth, tag="p") - results = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), results)) == [ - f"""

text {i}

""" for i in expected - ] - @pytest.mark.parametrize( argnames="nth, expected", argvalues=[ diff --git a/tests/soupsavvy/selectors/css/selectors/nth_of_type_test.py b/tests/soupsavvy/selectors/css/selectors/nth_of_type_test.py index b6d62592..4714b55f 100644 --- a/tests/soupsavvy/selectors/css/selectors/nth_of_type_test.py +++ b/tests/soupsavvy/selectors/css/selectors/nth_of_type_test.py @@ -14,16 +14,10 @@ class TestNthOfType: """Class with unit tests for NthOfType tag selector.""" - def test_selector_is_correct_without_tag(self): - """ - Tests if selector property returns correct value without specifying tag. - """ + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert NthOfType("2n").css == ":nth-of-type(2n)" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert NthOfType("2n", tag="div").css == "div:nth-of-type(2n)" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -51,31 +45,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""5"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
- Hello - 1 -
Hello
-
-

-

Hello

- Hello - 2 -
- Hello -

3

- """ - bs = find_body_element(to_bs(text)) - selector = NthOfType("2n", tag="a") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""1"""), - strip("""2"""), - strip("""

3

"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ @@ -322,47 +291,6 @@ def test_init_raise_exception_with_invalid_selector(self): with pytest.raises(InvalidCSSSelector): NthOfType("2x+1") - @pytest.mark.parametrize( - argnames="nth, expected", - argvalues=[ - ("2n", [2, 4, 6]), - ("2n+1", [1, 3, 5]), - # ignores whitespaces - (" 2n + 1", [1, 3, 5]), - ("-n+3", [1, 2, 3]), - ("even", [2, 4, 6]), - ("odd", [1, 3, 5]), - ("3", [3]), - ("-3n", []), - ("-3n+10", [1, 4]), - ], - ) - def test_returns_elements_based_on_nth_selector_and_tag( - self, nth: str, expected: list[int] - ): - """ - Tests if find_all returns all elements with specified tag name - matching various nth selectors. - """ - text = """ -

text 1

- Hello -

text 2

-

text 3

-
Hello
- Hello -

text 4

-

Hello world

-

text 5

-

text 6

- """ - bs = find_body_element(to_bs(text)) - selector = NthOfType(nth, tag="p") - results = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), results)) == [ - f"""

text {i}

""" for i in expected - ] - @pytest.mark.parametrize( argnames="nth, expected", argvalues=[ diff --git a/tests/soupsavvy/selectors/css/selectors/only_child_test.py b/tests/soupsavvy/selectors/css/selectors/only_child_test.py index eaab59a3..fabace18 100644 --- a/tests/soupsavvy/selectors/css/selectors/only_child_test.py +++ b/tests/soupsavvy/selectors/css/selectors/only_child_test.py @@ -12,14 +12,10 @@ class TestOnlyChild: """Class with unit tests for OnlyChild tag selector.""" - def test_selector_is_correct_without_tag(self): - """Tests if selector property returns correct value without specifying tag.""" + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert OnlyChild().css == ":only-child" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert OnlyChild("div").css == "div:only-child" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -41,26 +37,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""3"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
-

1

-

Hello

-

3

-
- 3 - -
- """ - bs = find_body_element(to_bs(text)) - selector = OnlyChild("a") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""

3

"""), - strip("""3"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ diff --git a/tests/soupsavvy/selectors/css/selectors/only_of_type_test.py b/tests/soupsavvy/selectors/css/selectors/only_of_type_test.py index c7197c8a..e2f8d5a3 100644 --- a/tests/soupsavvy/selectors/css/selectors/only_of_type_test.py +++ b/tests/soupsavvy/selectors/css/selectors/only_of_type_test.py @@ -12,14 +12,10 @@ class TestOnlyOfType: """Class with unit tests for OnlyOfType tag selector.""" - def test_selector_is_correct_without_tag(self): - """Tests if selector property returns correct value without specifying tag.""" + def test_css_selector_is_correct(self): + """Tests if selector property returns correct value.""" assert OnlyOfType().css == ":only-of-type" - def test_selector_is_correct_with_tag(self): - """Tests if selector property returns correct value when specifying tag.""" - assert OnlyOfType("div").css == "div:only-of-type" - def test_find_all_returns_all_tags_for_selector_without_tag_name(self): """Tests if find_all method returns all tags for selector without tag name.""" text = """ @@ -42,26 +38,6 @@ def test_find_all_returns_all_tags_for_selector_without_tag_name(self): strip("""

4

"""), ] - def test_find_all_returns_all_tags_for_selector_with_tag_name(self): - """Tests if find_all method returns all tags for selector with tag name.""" - text = """ -
-

1

-

Hello

-

-
-

2

- -
- """ - bs = find_body_element(to_bs(text)) - selector = OnlyOfType("p") - result = selector.find_all(bs) - assert list(map(lambda x: strip(str(x)), result)) == [ - strip("""

1

"""), - strip("""

2

"""), - ] - def test_find_returns_first_tag_matching_selector(self): """Tests if find method returns first tag matching selector.""" text = """ diff --git a/tests/soupsavvy/testing/generators/generators_test.py b/tests/soupsavvy/testing/generators/generators_test.py index eb4b5d36..139067bc 100644 --- a/tests/soupsavvy/testing/generators/generators_test.py +++ b/tests/soupsavvy/testing/generators/generators_test.py @@ -100,6 +100,14 @@ def test_raises_exception_if_empty_string_passed_as_name(self): class TestTagGenerator: """Component with unit tests for the TagGenerator class.""" + def test_raises_error_if_attrs_parameter_is_string(self): + """ + Test that the generator raises TypeError upon initialization + if the attrs parameter is a string. + """ + with pytest.raises(TypeError): + TagGenerator(name="div", attrs="class") + def test_generates_empty_tag_if_only_name_specified(self): """ Test that the generator returns string with empty tag