Skip to content

Commit

Permalink
Merge pull request #30 from vgalin/V2.0.0
Browse files Browse the repository at this point in the history
V2.0.0
  • Loading branch information
vgalin authored Jun 30, 2021
2 parents cc5b39c + c9caab6 commit c3c7bd8
Show file tree
Hide file tree
Showing 8 changed files with 357 additions and 183 deletions.
92 changes: 82 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ hti = Html2Image()
<summary> Multiple arguments can be passed to the constructor (click to expand):</summary>

- `browser` : Browser that will be used, set by default to `'chrome'` (the only browser supported by HTML2Image at the moment)
- `chrome_path` and `firefox_path` : The path or the command that can be used to find the executable of a specific browser.
- `browser_path` : The path or the command that can be used to find the executable of a specific browser.
- `output_path` : Path to the folder to which taken screenshots will be outputed. Default is the current working directory of your python program.
- `size` : 2-Tuple reprensenting the size of the screenshots that will be taken. Default value is `(1920, 1080)`.
- `temp_path` : Path that will be used by html2image to put together different resources *loaded* with the `load_str` and `load_file` methods. Default value is `%TEMP%/html2image` on Windows, and `/tmp/html2image` on Linux and MacOS.
- `temp_path` : Path that will be used to put together different resources when screenshotting strings of files. Default value is `%TEMP%/html2image` on Windows, and `/tmp/html2image` on Linux and MacOS.

Example:
```python
Expand Down Expand Up @@ -208,6 +208,62 @@ print(paths)
# >>> ['D:\\myFiles\\letters_0.png', 'D:\\myFiles\\letters_1.png', 'D:\\myFiles\\letters_2.png']
```

---

#### Change browser flags
In some cases, you may need to change the *flags* that are used to run the headless mode of a browser.

Flags can be used to:
- Change the default background color of the pages;
- Hide the scrollbar;
- Add delay before taking a screenshot;
- Allow you to use Html2Image when you're root, as you will have to specify the `--no-sandbox` flag;

You can find the full list of Chrome / Chromium flags [here](https://peter.sh/experiments/chromium-command-line-switches/).

There is two ways to specify custom flags:
```python
# At the object instanciation
hti = Html2image(custom_flags=['--my_flag', '--my_other_flag=value'])

# Afterwards
hti.browser.flags = ['--my_flag', '--my_other_flag=value']
```

- **Flags example use-case: adding a delay before taking a screenshot**

With Chrome / Chromium, screenshots are fired directly after there is no more "pending network fetches", but you may sometimes want to add a delay before taking a screenshot, to wait for animations to end for example.
There is a flag for this purpose, `--virtual-time-budget=VALUE_IN_MILLISECONDS`. You can use it like so:

```python
hti = Html2Image(
custom_flags=['--virtual-time-budget=10000', '--hide-scrollbars']
)

hti.screenshot(url='http://example.org')
```

- **Default flags**

For ease of use, some flags are set by default. However default flags are not used if you decide to specify `custom_flags` or change the value of `browser.flags`:

```python
# Taking a look at the default flags
>>> hti = Html2Image()
>>> hti.browser.flags
['--default-background-color=0', '--hide-scrollbars']

# Changing the value of browser.flags gets rid of the default flags.
>>> hti.browser.flags = ['--1', '--2']
>>> hti.browser.flags
['--1', '--2']

# Using the custom_flags parameter gets rid of the default flags.
>>> hti = Html2Image(custom_flags=['--a', '--b'])
>>> hti.browser.flags
['--a', '--b']
```

## Using the CLI
HTML2image comes with a Command Line Interface which you can use to generate screenshots from files and urls on the go.

Expand All @@ -234,16 +290,32 @@ You can call it by typing `hti` or `html2image` into a terminal.

## Testing

Only basic testing is available at the moment. To run tests, run PyTest at the root of the project:
```
Only basic testing is available at the moment. To run tests, install the requirements (Pillow) and run PyTest at the root of the project:
``` console
pip install -r requirements-test.txt
python -m pytest
```


## FAQ

- Can I automatically take a full page screenshot?
**Sadly no**, it is not easily possible. Html2Image relies on the headless mode of Chrome/Chromium browsers to take screenshots and there is no way to "ask" for a full page screenshot at the moment. If you know a way to take one (by estimating the page size for example) I would be happy to see it, so please open an issue or a discussion!

- Can I add delay before taking a screenshot?
**Yes** you can, please take a look at the `Change browser flags` section of the readme.

- Can I speed up the screenshot taking process?
**Yes**, when you are taking a lot of screenshots, you can achieve better "performances" using Parallel Processing or Multiprocessing methods. You can find an [example of it here](https://github.com/vgalin/html2image/issues/28#issuecomment-862608053).

- Can I make a cookie modal disappear?
**Yes and no**. **No** because there is no options to do it magically and [extensions are not supported in headless Chrome](https://bugs.chromium.org/p/chromium/issues/detail?id=706008#c5) (The [`I don't care about cookies`](https://www.i-dont-care-about-cookies.eu/) extension would have been useful in this case). **Yes** because you can make any element of a page disappear by retrieving its source code, modifying it as you wish, and finally screenshotting the modified source code.
## TODO List
- A nice CLI (Currently in a WIP state)
- A better way to name the CLI's outputed files ?
- Support of other browsers, such as Firefox
- More extensive doc + comments
- A nice CLI (currently in a WIP state).
- Support of other browsers (such as Firefox when their screenshot feature will work).
- PDF generation?
- Testing on push/PR with GitHub Actions
- Use threads or multiprocessing to speed up screenshot taking
- Contributing, issue templates, pull request template, code of conduct.

---

*If you see any typos or notice things that are odly said, feel free to create an issue or a pull request.*
Empty file added html2image/browsers/__init__.py
Empty file.
22 changes: 22 additions & 0 deletions html2image/browsers/browser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from abc import ABC, abstractmethod


class Browser(ABC):
"""Abstract class representing a web browser."""

def __init__(self, flags):
pass

@property
@abstractmethod
def executable_path(self):
pass

@executable_path.setter
@abstractmethod
def executable_path(self, value):
pass

@abstractmethod
def screenshot(self, *args, **kwargs):
pass
185 changes: 185 additions & 0 deletions html2image/browsers/chrome.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
from .browser import Browser

import subprocess
import platform
import os
import shutil


def _find_chrome(user_given_path=None):
""" Finds a Chrome executable.
Search Chrome on a given path. If no path given,
try to find Chrome or Chromium-browser on a Windows or Unix system.
Raises
------
- `FileNotFoundError`
+ If a suitable chrome executable could not be found.
Returns
-------
- str
+ Path of the chrome executable on the current machine.
"""

# TODO when other browsers will be available:
# Ensure that the given executable is a chrome one.

if user_given_path is not None:
if os.path.isfile(user_given_path):
return user_given_path
else:
raise FileNotFoundError('Could not find chrome in the given path.')

if platform.system() == 'Windows':
prefixes = [
os.getenv('PROGRAMFILES(X86)'),
os.getenv('PROGRAMFILES'),
os.getenv('LOCALAPPDATA'),
]

suffix = "Google\\Chrome\\Application\\chrome.exe"

for prefix in prefixes:
path_candidate = os.path.join(prefix, suffix)
if os.path.isfile(path_candidate):
return path_candidate

elif platform.system() == "Linux":

# search google-chrome
version_result = subprocess.check_output(
["google-chrome", "--version"]
)

if 'Google Chrome' in str(version_result):
return "google-chrome"

# else search chromium-browser

# snap seems to be a special case?
# see https://stackoverflow.com/q/63375327/12182226
version_result = subprocess.check_output(
["chromium-browser", "--version"]
)
if 'snap' in str(version_result):
chrome_snap = (
'/snap/chromium/current/usr/lib/chromium-browser/chrome'
)
if os.path.isfile(chrome_snap):
return chrome_snap
else:
which_result = shutil.which('chromium-browser')
if which_result is not None and os.path.isfile(which_result):
return which_result

elif platform.system() == "Darwin":
# MacOS system
chrome_app = (
'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
)
version_result = subprocess.check_output(
[chrome_app, "--version"]
)
if "Google Chrome" in str(version_result):
return chrome_app

raise FileNotFoundError(
'Could not find a Chrome executable on this '
'machine, please specify it yourself.'
)


class ChromeHeadless(Browser):
"""
Chrome/Chromium browser wrapper.
Parameters
----------
- `executable_path` : str, optional
+ Path to a chrome executable.
- `flags` : list of str
+ Flags to be used by the headless browser.
+ Default flags are :
- '--default-background-color=0'
- '--hide-scrollbars'
- `print_command` : bool
+ Whether or not to print the command used to take a screenshot.
"""

def __init__(self, executable_path=None, flags=None, print_command=False):
self.executable_path = executable_path
if not flags:
self.flags = [
'--default-background-color=0',
'--hide-scrollbars',
]
else:
self.flags = [flags] if isinstance(flags, str) else flags

self.print_command = print_command

@property
def executable_path(self):
return self._executable_path

@executable_path.setter
def executable_path(self, value):
self._executable_path = _find_chrome(value)

def screenshot(
self,
input,
output_path,
output_file='screenshot.png',
size=(1920, 1080),
):
""" Calls Chrome or Chromium headless to take a screenshot.
Parameters
----------
- `output_file`: str
+ Name as which the screenshot will be saved.
+ File extension (e.g. .png) has to be included.
+ Default is screenshot.png
- `input`: str
+ File or url that will be screenshotted.
+ Cannot be None
- `size`: (int, int), optional
+ Two values representing the window size of the headless
+ browser and by extention, the screenshot size.
+ These two values must be greater than 0.
Raises
------
- `ValueError`
+ If the value of `size` is incorrect.
+ If `input` is empty.
"""

if not input:
raise ValueError('The `input` parameter is empty.')

if size[0] < 1 or size[1] < 1:
raise ValueError(
f'Could not screenshot "{output_file}" '
f'with a size of {size}:\n'
'A valid size consists of two integers greater than 0.'
)

# command used to launch chrome in
# headless mode and take a screenshot
command = [
f'{self.executable_path}',
'--headless',
f'--screenshot={os.path.join(output_path, output_file)}',
f'--window-size={size[0]},{size[1]}',
*self.flags,
f'{input}',
]

if self.print_command:
print(' '.join(command))

subprocess.run(command)
21 changes: 21 additions & 0 deletions html2image/browsers/firefox.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from .browser import Browser


class FirefoxHeadless(Browser):

def __init__(self):
raise NotImplementedError(
"Could not make screenshot work on Firefox headless yet ...\n"
"See https://bugzilla.mozilla.org/show_bug.cgi?id=1715450"
)

@property
def executable_path(self):
pass

@executable_path.setter
def executable_path(self, value):
pass

def render(self, **kwargs):
pass
Loading

0 comments on commit c3c7bd8

Please sign in to comment.