Merge pull request #30 from vgalin/V2.0.0

V2.0.0
vgalin · Jun 30, 2021 · c3c7bd8 · c3c7bd8
2 parents cc5b39c + c9caab6
commit c3c7bd8
Show file tree

Hide file tree

Showing 8 changed files with 357 additions and 183 deletions.
diff --git a/README.md b/README.md
@@ -49,10 +49,10 @@ hti = Html2Image()
 <summary> Multiple arguments can be passed to the constructor (click to expand):</summary>
 
 -   `browser` :  Browser that will be used, set by default to `'chrome'` (the only browser supported by HTML2Image at the moment)
--   `chrome_path` and  `firefox_path` : The path or the command that can be used to find the executable of a specific browser.
+-   `browser_path` : The path or the command that can be used to find the executable of a specific browser.
 -   `output_path` : Path to the folder to which taken screenshots will be outputed. Default is the current working directory of your python program.
 -   `size` : 2-Tuple reprensenting the size of the screenshots that will be taken. Default value is `(1920, 1080)`.
--   `temp_path` : Path that will be used by html2image to put together different resources *loaded* with the `load_str` and `load_file` methods. Default value is `%TEMP%/html2image` on Windows, and `/tmp/html2image` on Linux and MacOS.
+-   `temp_path` : Path that will be used to put together different resources when screenshotting strings of files. Default value is `%TEMP%/html2image` on Windows, and `/tmp/html2image` on Linux and MacOS.
 
 Example:
 ```python
@@ -208,6 +208,62 @@ print(paths)
 # >>> ['D:\\myFiles\\letters_0.png', 'D:\\myFiles\\letters_1.png', 'D:\\myFiles\\letters_2.png']
 ```
 
+---
+
+#### Change browser flags
+In some cases, you may need to change the *flags* that are used to run the headless mode of a browser.
+
+Flags can be used to:
+- Change the default background color of the pages;
+- Hide the scrollbar;
+- Add delay before taking a screenshot;
+- Allow you to use Html2Image when you're root, as you will have to specify the `--no-sandbox` flag;
+
+You can find the full list of Chrome / Chromium flags [here](https://peter.sh/experiments/chromium-command-line-switches/).
+
+There is two ways to specify custom flags:
+```python
+# At the object instanciation
+hti = Html2image(custom_flags=['--my_flag', '--my_other_flag=value'])
+
+# Afterwards
+hti.browser.flags = ['--my_flag', '--my_other_flag=value']
+```
+
+- **Flags example use-case: adding a delay before taking a screenshot**
+
+With Chrome / Chromium, screenshots are fired directly after there is no more "pending network fetches", but you may sometimes want to add a delay before taking a screenshot, to wait for animations to end for example. 
+There is a flag for this purpose, `--virtual-time-budget=VALUE_IN_MILLISECONDS`. You can use it like so:
+
+```python
+hti = Html2Image(
+    custom_flags=['--virtual-time-budget=10000', '--hide-scrollbars']
+)
+
+hti.screenshot(url='http://example.org')
+```
+
+- **Default flags**
+
+For ease of use, some flags are set by default. However default flags are not used if you decide to specify `custom_flags` or change the value of `browser.flags`:
+
+```python
+# Taking a look at the default flags
+>>> hti = Html2Image()
+>>> hti.browser.flags
+['--default-background-color=0', '--hide-scrollbars']
+
+# Changing the value of browser.flags gets rid of the default flags.
+>>> hti.browser.flags = ['--1', '--2']
+>>> hti.browser.flags
+['--1', '--2'] 
+
+# Using the custom_flags parameter gets rid of the default flags.
+>>> hti = Html2Image(custom_flags=['--a', '--b'])
+>>> hti.browser.flags
+['--a', '--b']
+```
+
 ## Using the CLI
 HTML2image comes with a Command Line Interface which you can use to generate screenshots from files and urls on the go.
 
@@ -234,16 +290,32 @@ You can call it by typing `hti` or `html2image` into a terminal.
 
 ## Testing
 
-Only basic testing is available at the moment. To run tests, run PyTest at the root of the project:
-```
+Only basic testing is available at the moment. To run tests, install the requirements (Pillow) and run PyTest at the root of the project:
+``` console
+pip install -r requirements-test.txt
 python -m pytest
 ```
 
+
+## FAQ
+
+- Can I automatically take a full page screenshot?  
+**Sadly no**, it is not easily possible. Html2Image relies on the headless mode of Chrome/Chromium browsers to take screenshots and there is no way to "ask" for a full page screenshot at the moment. If you know a way to take one (by estimating the page size for example) I would be happy to see it, so please open an issue or a discussion!
+
+- Can I add delay before taking a screenshot?   
+**Yes** you can, please take a look at the `Change browser flags` section of the readme.
+
+- Can I speed up the screenshot taking process?  
+**Yes**, when you are taking a lot of screenshots, you can achieve better "performances" using Parallel Processing or Multiprocessing methods. You can find an [example of it here](https://github.com/vgalin/html2image/issues/28#issuecomment-862608053).
+
+- Can I make a cookie modal disappear?  
+**Yes and no**. **No** because there is no options to do it magically and [extensions are not supported in headless Chrome](https://bugs.chromium.org/p/chromium/issues/detail?id=706008#c5) (The [`I don't care about cookies`](https://www.i-dont-care-about-cookies.eu/) extension would have been useful in this case). **Yes** because you can make any element of a page disappear by retrieving its source code, modifying it as you wish, and finally screenshotting the modified source code.
 ## TODO List
--   A nice CLI (Currently in a WIP state)
-    - A better way to name the CLI's outputed files ?
--   Support of other browsers, such as Firefox
--   More extensive doc + comments
+-   A nice CLI (currently in a WIP state).
+-   Support of other browsers (such as Firefox when their screenshot feature will work).
 -   PDF generation?
--   Testing on push/PR with GitHub Actions
--   Use threads or multiprocessing to speed up screenshot taking
+-   Contributing, issue templates, pull request template, code of conduct.
+
+---
+
+*If you see any typos or notice things that are odly said, feel free to create an issue or a pull request.*
diff --git a/html2image/browsers/__init__.py b/html2image/browsers/__init__.py
diff --git a/html2image/browsers/browser.py b/html2image/browsers/browser.py
@@ -0,0 +1,22 @@
+from abc import ABC, abstractmethod
+
+
+class Browser(ABC):
+    """Abstract class representing a web browser."""
+
+    def __init__(self, flags):
+        pass
+
+    @property
+    @abstractmethod
+    def executable_path(self):
+        pass
+
+    @executable_path.setter
+    @abstractmethod
+    def executable_path(self, value):
+        pass
+
+    @abstractmethod
+    def screenshot(self, *args, **kwargs):
+        pass
diff --git a/html2image/browsers/chrome.py b/html2image/browsers/chrome.py
@@ -0,0 +1,185 @@
+from .browser import Browser
+
+import subprocess
+import platform
+import os
+import shutil
+
+
+def _find_chrome(user_given_path=None):
+    """ Finds a Chrome executable.
+
+    Search Chrome on a given path. If no path given,
+    try to find Chrome or Chromium-browser on a Windows or Unix system.
+
+    Raises
+    ------
+    - `FileNotFoundError`
+        + If a suitable chrome executable could not be found.
+
+    Returns
+    -------
+    - str
+        + Path of the chrome executable on the current machine.
+    """
+
+    # TODO when other browsers will be available:
+    # Ensure that the given executable is a chrome one.
+
+    if user_given_path is not None:
+        if os.path.isfile(user_given_path):
+            return user_given_path
+        else:
+            raise FileNotFoundError('Could not find chrome in the given path.')
+
+    if platform.system() == 'Windows':
+        prefixes = [
+            os.getenv('PROGRAMFILES(X86)'),
+            os.getenv('PROGRAMFILES'),
+            os.getenv('LOCALAPPDATA'),
+        ]
+
+        suffix = "Google\\Chrome\\Application\\chrome.exe"
+
+        for prefix in prefixes:
+            path_candidate = os.path.join(prefix, suffix)
+            if os.path.isfile(path_candidate):
+                return path_candidate
+
+    elif platform.system() == "Linux":
+
+        # search google-chrome
+        version_result = subprocess.check_output(
+            ["google-chrome", "--version"]
+        )
+
+        if 'Google Chrome' in str(version_result):
+            return "google-chrome"
+
+        # else search chromium-browser
+
+        # snap seems to be a special case?
+        # see https://stackoverflow.com/q/63375327/12182226
+        version_result = subprocess.check_output(
+            ["chromium-browser", "--version"]
+        )
+        if 'snap' in str(version_result):
+            chrome_snap = (
+                '/snap/chromium/current/usr/lib/chromium-browser/chrome'
+            )
+            if os.path.isfile(chrome_snap):
+                return chrome_snap
+        else:
+            which_result = shutil.which('chromium-browser')
+            if which_result is not None and os.path.isfile(which_result):
+                return which_result
+
+    elif platform.system() == "Darwin":
+        # MacOS system
+        chrome_app = (
+            '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
+        )
+        version_result = subprocess.check_output(
+            [chrome_app, "--version"]
+        )
+        if "Google Chrome" in str(version_result):
+            return chrome_app
+
+    raise FileNotFoundError(
+        'Could not find a Chrome executable on this '
+        'machine, please specify it yourself.'
+    )
+
+
+class ChromeHeadless(Browser):
+    """
+        Chrome/Chromium browser wrapper.
+
+        Parameters
+        ----------
+        - `executable_path` : str, optional
+            + Path to a chrome executable.
+
+        - `flags` : list of str
+            + Flags to be used by the headless browser.
+            + Default flags are :
+                - '--default-background-color=0'
+                - '--hide-scrollbars'
+        - `print_command` : bool
+            + Whether or not to print the command used to take a screenshot.
+    """
+
+    def __init__(self, executable_path=None, flags=None, print_command=False):
+        self.executable_path = executable_path
+        if not flags:
+            self.flags = [
+                '--default-background-color=0',
+                '--hide-scrollbars',
+            ]
+        else:
+            self.flags = [flags] if isinstance(flags, str) else flags
+
+        self.print_command = print_command
+
+    @property
+    def executable_path(self):
+        return self._executable_path
+
+    @executable_path.setter
+    def executable_path(self, value):
+        self._executable_path = _find_chrome(value)
+
+    def screenshot(
+        self,
+        input,
+        output_path,
+        output_file='screenshot.png',
+        size=(1920, 1080),
+    ):
+        """ Calls Chrome or Chromium headless to take a screenshot.
+
+            Parameters
+            ----------
+            - `output_file`: str
+                + Name as which the screenshot will be saved.
+                + File extension (e.g. .png) has to be included.
+                + Default is screenshot.png
+            - `input`: str
+                + File or url that will be screenshotted.
+                + Cannot be None
+            - `size`: (int, int), optional
+                + Two values representing the window size of the headless
+                + browser and by extention, the screenshot size.
+                + These two values must be greater than 0.
+            Raises
+            ------
+            - `ValueError`
+                + If the value of `size` is incorrect.
+                + If `input` is empty.
+        """
+
+        if not input:
+            raise ValueError('The `input` parameter is empty.')
+
+        if size[0] < 1 or size[1] < 1:
+            raise ValueError(
+                f'Could not screenshot "{output_file}" '
+                f'with a size of {size}:\n'
+                'A valid size consists of two integers greater than 0.'
+            )
+
+        # command used to launch chrome in
+        # headless mode and take a screenshot
+        command = [
+            f'{self.executable_path}',
+            '--headless',
+            f'--screenshot={os.path.join(output_path, output_file)}',
+            f'--window-size={size[0]},{size[1]}',
+            *self.flags,
+            f'{input}',
+        ]
+
+        if self.print_command:
+            print(' '.join(command))
+
+        subprocess.run(command)
diff --git a/html2image/browsers/firefox.py b/html2image/browsers/firefox.py
@@ -0,0 +1,21 @@
+from .browser import Browser
+
+
+class FirefoxHeadless(Browser):
+
+    def __init__(self):
+        raise NotImplementedError(
+            "Could not make screenshot work on Firefox headless yet ...\n"
+            "See https://bugzilla.mozilla.org/show_bug.cgi?id=1715450"
+        )
+
+    @property
+    def executable_path(self):
+        pass
+
+    @executable_path.setter
+    def executable_path(self, value):
+        pass
+
+    def render(self, **kwargs):
+        pass