Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
perklet committed Jul 3, 2024
1 parent 0db0ea2 commit e3083d0
Show file tree
Hide file tree
Showing 11 changed files with 129 additions and 32 deletions.
14 changes: 13 additions & 1 deletion README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
不同于其他的纯 Python http 客户端,比如 `httpx``requests``curl_cffi ` 可以模拟浏览器的
TLS/JA3 和 HTTP/2 指纹。如果你莫名其妙地被某个网站封锁了,可以来试试 `curl_cffi`

0.6 版本在 Windows 上的指纹全错了,如果你用的是 Windows 的话,请尽快升级。造成不便,多有抱歉。

只支持 Python 3.8 和以上版本,Python 3.7 已经官宣退役了。

------

<a href="https://scrapfly.io/?utm_source=github&utm_medium=sponsoring&utm_campaign=curl_cffi" target="_blank"><img src="assets/scrapfly.png" alt="Scrapfly.io" width="149"></a>
Expand All @@ -26,7 +30,7 @@ TLS/JA3 和 HTTP/2 指纹。如果你莫名其妙地被某个网站封锁了,

## 功能

- 支持 JA3/TLS 和 http2 指纹模拟。
- 支持 JA3/TLS 和 http2 指纹模拟,包含最新的浏览器和自定义指纹
- 比 requests/httpx 快得多,和 aiohttp/pycurl 的速度比肩,详见 [benchmarks](https://github.com/yifeikong/curl_cffi/tree/master/benchmark)
- 模仿 requests 的 API,不用再学一个新的。
- 预编译,不需要在自己机器上从头开始。
Expand Down Expand Up @@ -83,6 +87,12 @@ print(r.json())
# Other similar values are: "safari" and "safari_ios"
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome")

# To pin a specific version, use version numbers together.
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome124")

# 自定义指纹, examples 中有具体例子。
r = requests.get("https://tls.browserleaks.com/json", ja3=..., akamai=...)

# 支持使用代理
proxies = {"https": "http://localhost:3128"}
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome110", proxies=proxies)
Expand Down Expand Up @@ -112,6 +122,8 @@ print(r.json())
只有当浏览器指纹发生改编的时候,才会添加新版本。如果你看到某个版本被跳过去了,那是因为
他们的指纹没有发生改变,直接用之前的版本加上新的 header 即可。

如果你要模仿的不是浏览器, 使用 `ja3=...` and `akamai=...` 来指定你的自定义指纹. 参见[文档](https://curl-cffi.readthedocs.io/en/latest/impersonate.html).

- chrome99
- chrome100
- chrome101
Expand Down
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ Unlike other pure python http clients like `httpx` or `requests`, `curl_cffi` ca
impersonate browsers' TLS/JA3 and HTTP/2 fingerprints. If you are blocked by some
website for no obvious reason, you can give `curl_cffi` a try.

The fingerprints in 0.6 on Windows are all wrong, you should update to 0.7 if you are on
Windows. Sorry for the inconvenience.

Only Python 3.8 and above are supported. Python 3.7 has reached its end of life.

------

<a href="https://scrapfly.io/?utm_source=github&utm_medium=sponsoring&utm_campaign=curl_cffi" target="_blank"><img src="https://raw.githubusercontent.com/yifeikong/curl_cffi/main/assets/scrapfly.png" alt="Scrapfly.io" width="149"></a>
Expand All @@ -34,7 +39,7 @@ If you are managing TLS/HTTP fingerprint by yourself with `curl_cffi`, they also

## Features

- Supports JA3/TLS and http2 fingerprints impersonation.
- Supports JA3/TLS and http2 fingerprints impersonation, inlucding recent browsers and custome fingerprints.
- Much faster than requests/httpx, on par with aiohttp/pycurl, see [benchmarks](https://github.com/yifeikong/curl_cffi/tree/main/benchmark).
- Mimics requests API, no need to learn another one.
- Pre-compiled, so you don't have to compile on your machine.
Expand Down Expand Up @@ -130,7 +135,7 @@ Browser versions will be added **only** when their fingerprints change. If you s
chrome122, were skipped, you can simply impersonate it with your own headers and the previous version.

If you are trying to impersonate a target other than a browser, use `ja3=...` and `akamai=...`
to specify your own customized fingerprints.
to specify your own customized fingerprints. See the [docs on impersonatation](https://curl-cffi.readthedocs.io/en/latest/impersonate.html) for details.

- chrome99
- chrome100
Expand Down
20 changes: 12 additions & 8 deletions curl_cffi/requests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def request(
proxy_auth: Optional[Tuple[str, str]] = None,
verify: Optional[bool] = None,
referer: Optional[str] = None,
accept_encoding: Optional[str] = "gzip, deflate, br",
accept_encoding: Optional[str] = "gzip, deflate, br, zstd",
content_callback: Optional[Callable] = None,
impersonate: Optional[Union[str, BrowserType]] = None,
ja3: Optional[str] = None,
Expand All @@ -80,7 +80,7 @@ def request(
method: http method for the request: GET/POST/PUT/DELETE etc.
url: url for the requests.
params: query string for the requests.
data: form values or binary data to use in body,
data: form values(dict/list/tuple) or binary data to use in body,
``Content-Type: application/x-www-form-urlencoded`` will be added if a dict is given.
json: json values to use in body, `Content-Type: application/json` will be added
automatically.
Expand All @@ -93,7 +93,7 @@ def request(
max_redirects: max redirect counts, default 30, use -1 for unlimited.
proxies: dict of proxies to use, format: ``{"http": proxy_url, "https": proxy_url}``.
proxy: proxy to use, format: "http://user@pass:proxy_url".
Can't be used with proxy parameter.
Can't be used with `proxies` parameter.
proxy_auth: HTTP basic auth for proxy, a tuple of (username, password).
verify: whether to verify https certs.
referer: shortcut for setting referer header.
Expand All @@ -104,15 +104,19 @@ def request(
ja3: ja3 string to impersonate.
akamai: akamai string to impersonate.
extra_fp: extra fingerprints options, in complement to ja3 and akamai strings.
thread: work with other thread implementations. choices: eventlet, gevent.
default_headers: whether to set default browser headers.
thread: thread engine to use for working with other thread implementations.
choices: eventlet, gevent.
default_headers: whether to set default browser headers when impersonating.
default_encoding: encoding for decoding response content if charset is not found in headers.
Defaults to "utf-8". Can be set to a callable for automatic detection.
curl_options: extra curl options to use.
http_version: limiting http version, http2 will be tries by default.
http_version: limiting http version, defaults to http2.
debug: print extra curl debug info.
interface: which interface use in request to server.
multipart: upload files using the multipart format, see.
interface: which interface to use.
cert: a tuple of (cert, key) filenames for client cert.
stream: streaming the response, default False.
max_recv_speed: maximum receive speed, bytes per second.
multipart: upload files using the multipart format, see examples for details.
Returns:
A ``Response`` object.
Expand Down
14 changes: 8 additions & 6 deletions curl_cffi/requests/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ def _set_curl_options(
proxy_auth: Optional[Tuple[str, str]] = None,
verify: Optional[Union[bool, str]] = None,
referer: Optional[str] = None,
accept_encoding: Optional[str] = "gzip, deflate, br",
accept_encoding: Optional[str] = "gzip, deflate, br, zstd",
content_callback: Optional[Callable] = None,
impersonate: Optional[Union[str, BrowserType]] = None,
ja3: Optional[str] = None,
Expand Down Expand Up @@ -378,7 +378,7 @@ def _set_curl_options(
elif data is None:
body = b""
else:
raise TypeError("data must be dict, str, BytesIO or bytes")
raise TypeError("data must be dict/list/tuple, str, BytesIO or bytes")
if json is not None:
body = dumps(json, separators=(",", ":")).encode()

Expand Down Expand Up @@ -723,15 +723,15 @@ def __init__(
created. Also, a fresh curl object will always be created when accessed
from another thread.
thread: thread engine to use for working with other thread implementations.
choices: eventlet, gevent., possible values: eventlet, gevent.
choices: eventlet, gevent.
headers: headers to use in the session.
cookies: cookies to add in the session.
auth: HTTP basic auth, a tuple of (username, password), only basic auth is supported.
proxies: dict of proxies to use, format: {"http": proxy_url, "https": proxy_url}.
proxy: proxy to use, format: "http://proxy_url".
Cannot be used with the above parameter.
proxy_auth: HTTP basic auth for proxy, a tuple of (username, password).
base_url: absolute url to use for relative urls.
base_url: absolute url to use as base for relative urls.
params: query string for the session.
verify: whether to verify https certs.
timeout: how many seconds to wait before giving up.
Expand All @@ -742,9 +742,10 @@ def __init__(
ja3: ja3 string to impersonate in the session.
akamai: akamai string to impersonate in the session.
extra_fp: extra fingerprints options, in complement to ja3 and akamai strings.
interface: which interface use in request to server.
interface: which interface use.
default_encoding: encoding for decoding response content if charset is not found in
headers. Defaults to "utf-8". Can be set to a callable for automatic detection.
cert: a tuple of (cert, key) filenames for client cert.
Notes:
This class can be used as a context manager.
Expand Down Expand Up @@ -826,7 +827,7 @@ def ws_connect(
on_message: message callback, ``def on_message(ws, str)``
on_error: error callback, ``def on_error(ws, error)``
on_open: open callback, ``def on_open(ws)``
on_cloes: close callback, ``def on_close(ws)``
on_close: close callback, ``def on_close(ws)``
Other parameters are the same as ``.request``
Expand Down Expand Up @@ -1039,6 +1040,7 @@ def __init__(
extra_fp: extra fingerprints options, in complement to ja3 and akamai strings.
default_encoding: encoding for decoding response content if charset is not found
in headers. Defaults to "utf-8". Can be set to a callable for automatic detection.
cert: a tuple of (cert, key) filenames for client cert.
Notes:
This class can be used as a context manager, and it's recommended to use via
Expand Down
2 changes: 1 addition & 1 deletion docs/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Alternatively, you can use the low-level curl-like API:
c.setopt(CurlOpt.URL, b'https://tls.browserleaks.com/json')
c.setopt(CurlOpt.WRITEDATA, buffer)
c.impersonate("chrome120")
c.impersonate("chrome124")
c.perform()
c.close()
Expand Down
1 change: 1 addition & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Enum values used by ``setopt`` and ``getinfo`` can be accessed from ``CurlOpt``
.. autoclass:: curl_cffi.CurlECode
.. autoclass:: curl_cffi.CurlHttpVersion
.. autoclass:: curl_cffi.CurlWsFlag
.. autoclass:: curl_cffi.CurlSslVersion

requests API
--------
Expand Down
3 changes: 2 additions & 1 deletion docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ v0.7

- v0.7.0
- Added more recent impersonate versions, up to Chrome 124.
- Upgraded libcurl to 8.7.1.
- Upgraded ``libcurl`` to 8.7.1.
- Supported custom impersonation.
- Added support for list of tuple in post fields.
- Updated header strategy: always exclude empty headers, never send Expect header.
- Changed default redirect limit to 30.
Expand Down
12 changes: 6 additions & 6 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ Why does the JA3 fingerprints change for Chrome 110+ impersonation?

This is intended.

Chrome introduces ClientHello permutation in version 110, which means the order of
Chrome introduces ``ClientHello`` permutation in version 110, which means the order of
extensions will be random, thus JA3 fingerprints will be random. So, when comparing
JA3 fingerprints of `curl_cffi` and a browser, they may differ. However, this does not
mean that TLS fingerprints will not be a problem, ClientHello extension order is just
JA3 fingerprints of ``curl_cffi`` and a browser, they may differ. However, this does not
mean that TLS fingerprints will not be a problem, ``ClientHello`` extension order is just
one factor of how servers can tell automated requests from browsers.

Roughly, this can be mitigated like:
Expand Down Expand Up @@ -71,18 +71,18 @@ to use with them, simply set ``verify=False``.
ErrCode: 92, Reason: 'HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)'
------

This error(http/2 stream 0) has been reported many times ever since curl_cffi was
This error(http/2 stream 0) has been reported many times ever since `curl_cffi` was
published, but I still can not find a reproducible way to trigger it. Given that the
majority users are behind proxies, the situation is even more difficult to deal with.

I'm even not sure it's a bug introduced in libcurl, curl-impersonate or curl_cffi, or
it's just a server error. Depending on your context, here are some general suggestions
for users:
for you:

- First, try removing the ``Content-Length`` header from you request.
- Try to see if this error was caused by proxies, if so, use better proxies.
- If it stops working after a while, maybe you're just being blocked by, such as, Akamai.
- Force http/1.1 mode. Some websites' h2 implemetation is simple broken.
- Force http/1.1 mode. Some websites' h2 implementation is simply broken.
- See if the url works in your real browser.
- Find a stable way to reproduce it, so we can finally fix, or at least bypass it.

Expand Down
80 changes: 73 additions & 7 deletions docs/impersonate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,16 @@ Impersonate guide
Supported browser versions
--------------------------

Supported impersonate versions, as supported by our `fork <https://github.com/yifeikong/curl-impersonate>`_ of `curl-impersonate <https://github.com/lwthiker/curl-impersonate>`_:
``curl_cffi`` supports the same browser versions as supported by our `fork <https://github.com/yifeikong/curl-impersonate>`_ of `curl-impersonate <https://github.com/lwthiker/curl-impersonate>`_:

However, only Chrome-like browsers are supported. Firefox support is tracked in `#59 <https://github.com/yifeikong/curl_cffi/issues/59>`_.

Browser versions will be added **only** when their fingerprints change. If you see a version, e.g.
chrome122, were skipped, you can simply impersonate it with your own headers and the previous version.

If you are trying to impersonate a target other than a browser, use ``ja3=...`` and ``akamai=...``
to specify your own customized fingerprints. See below for details.

- chrome99
- chrome100
- chrome101
Expand Down Expand Up @@ -46,17 +52,67 @@ browser versions, you can simply use ``chrome``, ``safari`` and ``safari_ios``.
requests.get(url, impersonate="chrome")
iOS has restrictions on WebView and TLS libs, so safari_x_ios should work for most apps.
If you encountered an android app with custom fingerprints, you can try the ``safari ios``
iOS has restrictions on WebView and TLS libs, so ``safari_x_ios`` should work for most apps.
If you encountered an android app with custom fingerprints, you can try the ``safari_ios``
fingerprints given that this app should have an iOS version.

How to customize my fingerprints? e.g. okhttp
How to use my own fingerprints other than the builtin ones? e.g. okhttp
------

It's not fully implemented, yet.
Use ``ja3=...``, ``akamai=...`` and ``extra_fp=...``.

You can retrieve the JA3 and Akamai strings using tools like WireShark or from TLS fingerprinting sites.

.. code-block:: python
There are many parts in the JA3 and Akamai http2 fingerprints. Some of them can be changed,
while some can not be changed at the moment. The progress is tracked in https://github.com/yifeikong/curl_cffi/issues/194.
# OKHTTP impersonatation examples
# credits: https://github.com/bogdanfinn/tls-client/blob/master/profiles/contributed_custom_profiles.go
url = "https://tls.browserleaks.com/json"
okhttp4_android10_ja3 = ",".join(
[
"771",
"4865-4866-4867-49195-49196-52393-49199-49200-52392-49171-49172-156-157-47-53",
"0-23-65281-10-11-35-16-5-13-51-45-43-21",
"29-23-24",
"0",
]
)
okhttp4_android10_akamai = "4:16777216|16711681|0|m,p,a,s"
extra_fp = {
"tls_signature_algorithms": [
"ecdsa_secp256r1_sha256",
"rsa_pss_rsae_sha256",
"rsa_pkcs1_sha256",
"ecdsa_secp384r1_sha384",
"rsa_pss_rsae_sha384",
"rsa_pkcs1_sha384",
"rsa_pss_rsae_sha512",
"rsa_pkcs1_sha512",
"rsa_pkcs1_sha1",
]
# other options:
# tls_min_version: int = CurlSslVersion.TLSv1_2
# tls_grease: bool = False
# tls_permute_extensions: bool = False
# tls_cert_compression: Literal["zlib", "brotli"] = "brotli"
# tls_signature_algorithms: Optional[List[str]] = None
# http2_stream_weight: int = 256
# http2_stream_exclusive: int = 1
# See requests/impersonate.py and tests/unittest/test_impersonate.py for more examples
}
r = requests.get(
url, ja3=okhttp4_android10_ja3, akamai=okhttp4_android10_akamai, extra_fp=extra_fp
)
print(r.json())
The other way is to use the ``curlopt`` s to specify exactly which options you want to change.

To modify them, use ``curl.setopt(CurlOpt, value)``, for example:

Expand Down Expand Up @@ -116,6 +172,16 @@ randomized, due to the ``extension permutation`` feature introduced in Chrome 11
As far as we know, most websites use an allowlist, not a blocklist to filter out bot
traffic. So do not expect random ja3 fingerprints would work in the wild.

Moreover, do not generate random ja3 strings. There are certain limits for a valid ja3 string.
For example:

* TLS 1.3 ciphers must be at the front.
* GREASE extension must be the first.
* etc.

You should copy ja3 strings from sniffing tools, not generate them, unless you can make
sure all the requirements are met.

Can I change JavaScript fingerprints with this library?
------

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Welcome to curl_cffi's documentation!

install
impersonate
cookies
advanced
vs-requests
faq
Expand Down
5 changes: 5 additions & 0 deletions docs/install.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
Install
=======

The fingerprints in 0.6 on Windows are all wrong, you should update to 0.7 if you are on
Windows.

Only Python 3.8 and above are supported. Python 3.7 has reached its end of life.

Via pip
------

Expand Down

0 comments on commit e3083d0

Please sign in to comment.