diff --git a/README-zh.md b/README-zh.md index ca4056a..497f830 100644 --- a/README-zh.md +++ b/README-zh.md @@ -10,6 +10,10 @@ 不同于其他的纯 Python http 客户端,比如 `httpx` 和 `requests`,`curl_cffi ` 可以模拟浏览器的 TLS/JA3 和 HTTP/2 指纹。如果你莫名其妙地被某个网站封锁了,可以来试试 `curl_cffi`。 +0.6 版本在 Windows 上的指纹全错了,如果你用的是 Windows 的话,请尽快升级。造成不便,多有抱歉。 + +只支持 Python 3.8 和以上版本,Python 3.7 已经官宣退役了。 + ------ Scrapfly.io @@ -26,7 +30,7 @@ TLS/JA3 和 HTTP/2 指纹。如果你莫名其妙地被某个网站封锁了, ## 功能 -- 支持 JA3/TLS 和 http2 指纹模拟。 +- 支持 JA3/TLS 和 http2 指纹模拟,包含最新的浏览器和自定义指纹。 - 比 requests/httpx 快得多,和 aiohttp/pycurl 的速度比肩,详见 [benchmarks](https://github.com/yifeikong/curl_cffi/tree/master/benchmark)。 - 模仿 requests 的 API,不用再学一个新的。 - 预编译,不需要在自己机器上从头开始。 @@ -83,6 +87,12 @@ print(r.json()) # Other similar values are: "safari" and "safari_ios" r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome") +# To pin a specific version, use version numbers together. +r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome124") + +# 自定义指纹, examples 中有具体例子。 +r = requests.get("https://tls.browserleaks.com/json", ja3=..., akamai=...) + # 支持使用代理 proxies = {"https": "http://localhost:3128"} r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome110", proxies=proxies) @@ -112,6 +122,8 @@ print(r.json()) 只有当浏览器指纹发生改编的时候,才会添加新版本。如果你看到某个版本被跳过去了,那是因为 他们的指纹没有发生改变,直接用之前的版本加上新的 header 即可。 +如果你要模仿的不是浏览器, 使用 `ja3=...` and `akamai=...` 来指定你的自定义指纹. 参见[文档](https://curl-cffi.readthedocs.io/en/latest/impersonate.html). + - chrome99 - chrome100 - chrome101 diff --git a/README.md b/README.md index 9a3eeec..c722a14 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,11 @@ Unlike other pure python http clients like `httpx` or `requests`, `curl_cffi` ca impersonate browsers' TLS/JA3 and HTTP/2 fingerprints. If you are blocked by some website for no obvious reason, you can give `curl_cffi` a try. +The fingerprints in 0.6 on Windows are all wrong, you should update to 0.7 if you are on +Windows. Sorry for the inconvenience. + +Only Python 3.8 and above are supported. Python 3.7 has reached its end of life. + ------ Scrapfly.io @@ -34,7 +39,7 @@ If you are managing TLS/HTTP fingerprint by yourself with `curl_cffi`, they also ## Features -- Supports JA3/TLS and http2 fingerprints impersonation. +- Supports JA3/TLS and http2 fingerprints impersonation, inlucding recent browsers and custome fingerprints. - Much faster than requests/httpx, on par with aiohttp/pycurl, see [benchmarks](https://github.com/yifeikong/curl_cffi/tree/main/benchmark). - Mimics requests API, no need to learn another one. - Pre-compiled, so you don't have to compile on your machine. @@ -130,7 +135,7 @@ Browser versions will be added **only** when their fingerprints change. If you s chrome122, were skipped, you can simply impersonate it with your own headers and the previous version. If you are trying to impersonate a target other than a browser, use `ja3=...` and `akamai=...` -to specify your own customized fingerprints. +to specify your own customized fingerprints. See the [docs on impersonatation](https://curl-cffi.readthedocs.io/en/latest/impersonate.html) for details. - chrome99 - chrome100 diff --git a/curl_cffi/requests/__init__.py b/curl_cffi/requests/__init__.py index 13fb1f7..1c50cda 100644 --- a/curl_cffi/requests/__init__.py +++ b/curl_cffi/requests/__init__.py @@ -56,7 +56,7 @@ def request( proxy_auth: Optional[Tuple[str, str]] = None, verify: Optional[bool] = None, referer: Optional[str] = None, - accept_encoding: Optional[str] = "gzip, deflate, br", + accept_encoding: Optional[str] = "gzip, deflate, br, zstd", content_callback: Optional[Callable] = None, impersonate: Optional[Union[str, BrowserType]] = None, ja3: Optional[str] = None, @@ -80,7 +80,7 @@ def request( method: http method for the request: GET/POST/PUT/DELETE etc. url: url for the requests. params: query string for the requests. - data: form values or binary data to use in body, + data: form values(dict/list/tuple) or binary data to use in body, ``Content-Type: application/x-www-form-urlencoded`` will be added if a dict is given. json: json values to use in body, `Content-Type: application/json` will be added automatically. @@ -93,7 +93,7 @@ def request( max_redirects: max redirect counts, default 30, use -1 for unlimited. proxies: dict of proxies to use, format: ``{"http": proxy_url, "https": proxy_url}``. proxy: proxy to use, format: "http://user@pass:proxy_url". - Can't be used with proxy parameter. + Can't be used with `proxies` parameter. proxy_auth: HTTP basic auth for proxy, a tuple of (username, password). verify: whether to verify https certs. referer: shortcut for setting referer header. @@ -104,15 +104,19 @@ def request( ja3: ja3 string to impersonate. akamai: akamai string to impersonate. extra_fp: extra fingerprints options, in complement to ja3 and akamai strings. - thread: work with other thread implementations. choices: eventlet, gevent. - default_headers: whether to set default browser headers. + thread: thread engine to use for working with other thread implementations. + choices: eventlet, gevent. + default_headers: whether to set default browser headers when impersonating. default_encoding: encoding for decoding response content if charset is not found in headers. Defaults to "utf-8". Can be set to a callable for automatic detection. curl_options: extra curl options to use. - http_version: limiting http version, http2 will be tries by default. + http_version: limiting http version, defaults to http2. debug: print extra curl debug info. - interface: which interface use in request to server. - multipart: upload files using the multipart format, see. + interface: which interface to use. + cert: a tuple of (cert, key) filenames for client cert. + stream: streaming the response, default False. + max_recv_speed: maximum receive speed, bytes per second. + multipart: upload files using the multipart format, see examples for details. Returns: A ``Response`` object. diff --git a/curl_cffi/requests/session.py b/curl_cffi/requests/session.py index 75e2d84..c5a1fe8 100644 --- a/curl_cffi/requests/session.py +++ b/curl_cffi/requests/session.py @@ -331,7 +331,7 @@ def _set_curl_options( proxy_auth: Optional[Tuple[str, str]] = None, verify: Optional[Union[bool, str]] = None, referer: Optional[str] = None, - accept_encoding: Optional[str] = "gzip, deflate, br", + accept_encoding: Optional[str] = "gzip, deflate, br, zstd", content_callback: Optional[Callable] = None, impersonate: Optional[Union[str, BrowserType]] = None, ja3: Optional[str] = None, @@ -378,7 +378,7 @@ def _set_curl_options( elif data is None: body = b"" else: - raise TypeError("data must be dict, str, BytesIO or bytes") + raise TypeError("data must be dict/list/tuple, str, BytesIO or bytes") if json is not None: body = dumps(json, separators=(",", ":")).encode() @@ -723,7 +723,7 @@ def __init__( created. Also, a fresh curl object will always be created when accessed from another thread. thread: thread engine to use for working with other thread implementations. - choices: eventlet, gevent., possible values: eventlet, gevent. + choices: eventlet, gevent. headers: headers to use in the session. cookies: cookies to add in the session. auth: HTTP basic auth, a tuple of (username, password), only basic auth is supported. @@ -731,7 +731,7 @@ def __init__( proxy: proxy to use, format: "http://proxy_url". Cannot be used with the above parameter. proxy_auth: HTTP basic auth for proxy, a tuple of (username, password). - base_url: absolute url to use for relative urls. + base_url: absolute url to use as base for relative urls. params: query string for the session. verify: whether to verify https certs. timeout: how many seconds to wait before giving up. @@ -742,9 +742,10 @@ def __init__( ja3: ja3 string to impersonate in the session. akamai: akamai string to impersonate in the session. extra_fp: extra fingerprints options, in complement to ja3 and akamai strings. - interface: which interface use in request to server. + interface: which interface use. default_encoding: encoding for decoding response content if charset is not found in headers. Defaults to "utf-8". Can be set to a callable for automatic detection. + cert: a tuple of (cert, key) filenames for client cert. Notes: This class can be used as a context manager. @@ -826,7 +827,7 @@ def ws_connect( on_message: message callback, ``def on_message(ws, str)`` on_error: error callback, ``def on_error(ws, error)`` on_open: open callback, ``def on_open(ws)`` - on_cloes: close callback, ``def on_close(ws)`` + on_close: close callback, ``def on_close(ws)`` Other parameters are the same as ``.request`` @@ -1039,6 +1040,7 @@ def __init__( extra_fp: extra fingerprints options, in complement to ja3 and akamai strings. default_encoding: encoding for decoding response content if charset is not found in headers. Defaults to "utf-8". Can be set to a callable for automatic detection. + cert: a tuple of (cert, key) filenames for client cert. Notes: This class can be used as a context manager, and it's recommended to use via diff --git a/docs/advanced.rst b/docs/advanced.rst index 0946ade..d1f0fd8 100644 --- a/docs/advanced.rst +++ b/docs/advanced.rst @@ -16,7 +16,7 @@ Alternatively, you can use the low-level curl-like API: c.setopt(CurlOpt.URL, b'https://tls.browserleaks.com/json') c.setopt(CurlOpt.WRITEDATA, buffer) - c.impersonate("chrome120") + c.impersonate("chrome124") c.perform() c.close() diff --git a/docs/api.rst b/docs/api.rst index acf5d31..64f05b4 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -64,6 +64,7 @@ Enum values used by ``setopt`` and ``getinfo`` can be accessed from ``CurlOpt`` .. autoclass:: curl_cffi.CurlECode .. autoclass:: curl_cffi.CurlHttpVersion .. autoclass:: curl_cffi.CurlWsFlag +.. autoclass:: curl_cffi.CurlSslVersion requests API -------- diff --git a/docs/changelog.rst b/docs/changelog.rst index 05a2f0c..96ce6b5 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -6,7 +6,8 @@ v0.7 - v0.7.0 - Added more recent impersonate versions, up to Chrome 124. - - Upgraded libcurl to 8.7.1. + - Upgraded ``libcurl`` to 8.7.1. + - Supported custom impersonation. - Added support for list of tuple in post fields. - Updated header strategy: always exclude empty headers, never send Expect header. - Changed default redirect limit to 30. diff --git a/docs/faq.rst b/docs/faq.rst index 7e5514c..4edaa33 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -6,10 +6,10 @@ Why does the JA3 fingerprints change for Chrome 110+ impersonation? This is intended. -Chrome introduces ClientHello permutation in version 110, which means the order of +Chrome introduces ``ClientHello`` permutation in version 110, which means the order of extensions will be random, thus JA3 fingerprints will be random. So, when comparing -JA3 fingerprints of `curl_cffi` and a browser, they may differ. However, this does not -mean that TLS fingerprints will not be a problem, ClientHello extension order is just +JA3 fingerprints of ``curl_cffi`` and a browser, they may differ. However, this does not +mean that TLS fingerprints will not be a problem, ``ClientHello`` extension order is just one factor of how servers can tell automated requests from browsers. Roughly, this can be mitigated like: @@ -71,18 +71,18 @@ to use with them, simply set ``verify=False``. ErrCode: 92, Reason: 'HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)' ------ -This error(http/2 stream 0) has been reported many times ever since curl_cffi was +This error(http/2 stream 0) has been reported many times ever since `curl_cffi` was published, but I still can not find a reproducible way to trigger it. Given that the majority users are behind proxies, the situation is even more difficult to deal with. I'm even not sure it's a bug introduced in libcurl, curl-impersonate or curl_cffi, or it's just a server error. Depending on your context, here are some general suggestions -for users: +for you: - First, try removing the ``Content-Length`` header from you request. - Try to see if this error was caused by proxies, if so, use better proxies. - If it stops working after a while, maybe you're just being blocked by, such as, Akamai. -- Force http/1.1 mode. Some websites' h2 implemetation is simple broken. +- Force http/1.1 mode. Some websites' h2 implementation is simply broken. - See if the url works in your real browser. - Find a stable way to reproduce it, so we can finally fix, or at least bypass it. diff --git a/docs/impersonate.rst b/docs/impersonate.rst index d20e8e9..b364775 100644 --- a/docs/impersonate.rst +++ b/docs/impersonate.rst @@ -4,10 +4,16 @@ Impersonate guide Supported browser versions -------------------------- -Supported impersonate versions, as supported by our `fork `_ of `curl-impersonate `_: +``curl_cffi`` supports the same browser versions as supported by our `fork `_ of `curl-impersonate `_: However, only Chrome-like browsers are supported. Firefox support is tracked in `#59 `_. +Browser versions will be added **only** when their fingerprints change. If you see a version, e.g. +chrome122, were skipped, you can simply impersonate it with your own headers and the previous version. + +If you are trying to impersonate a target other than a browser, use ``ja3=...`` and ``akamai=...`` +to specify your own customized fingerprints. See below for details. + - chrome99 - chrome100 - chrome101 @@ -46,17 +52,67 @@ browser versions, you can simply use ``chrome``, ``safari`` and ``safari_ios``. requests.get(url, impersonate="chrome") -iOS has restrictions on WebView and TLS libs, so safari_x_ios should work for most apps. -If you encountered an android app with custom fingerprints, you can try the ``safari ios`` +iOS has restrictions on WebView and TLS libs, so ``safari_x_ios`` should work for most apps. +If you encountered an android app with custom fingerprints, you can try the ``safari_ios`` fingerprints given that this app should have an iOS version. -How to customize my fingerprints? e.g. okhttp +How to use my own fingerprints other than the builtin ones? e.g. okhttp ------ -It's not fully implemented, yet. +Use ``ja3=...``, ``akamai=...`` and ``extra_fp=...``. + +You can retrieve the JA3 and Akamai strings using tools like WireShark or from TLS fingerprinting sites. + +.. code-block:: python -There are many parts in the JA3 and Akamai http2 fingerprints. Some of them can be changed, -while some can not be changed at the moment. The progress is tracked in https://github.com/yifeikong/curl_cffi/issues/194. + # OKHTTP impersonatation examples + # credits: https://github.com/bogdanfinn/tls-client/blob/master/profiles/contributed_custom_profiles.go + + url = "https://tls.browserleaks.com/json" + + okhttp4_android10_ja3 = ",".join( + [ + "771", + "4865-4866-4867-49195-49196-52393-49199-49200-52392-49171-49172-156-157-47-53", + "0-23-65281-10-11-35-16-5-13-51-45-43-21", + "29-23-24", + "0", + ] + ) + + okhttp4_android10_akamai = "4:16777216|16711681|0|m,p,a,s" + + extra_fp = { + "tls_signature_algorithms": [ + "ecdsa_secp256r1_sha256", + "rsa_pss_rsae_sha256", + "rsa_pkcs1_sha256", + "ecdsa_secp384r1_sha384", + "rsa_pss_rsae_sha384", + "rsa_pkcs1_sha384", + "rsa_pss_rsae_sha512", + "rsa_pkcs1_sha512", + "rsa_pkcs1_sha1", + ] + # other options: + # tls_min_version: int = CurlSslVersion.TLSv1_2 + # tls_grease: bool = False + # tls_permute_extensions: bool = False + # tls_cert_compression: Literal["zlib", "brotli"] = "brotli" + # tls_signature_algorithms: Optional[List[str]] = None + # http2_stream_weight: int = 256 + # http2_stream_exclusive: int = 1 + + # See requests/impersonate.py and tests/unittest/test_impersonate.py for more examples + } + + + r = requests.get( + url, ja3=okhttp4_android10_ja3, akamai=okhttp4_android10_akamai, extra_fp=extra_fp + ) + print(r.json()) + +The other way is to use the ``curlopt`` s to specify exactly which options you want to change. To modify them, use ``curl.setopt(CurlOpt, value)``, for example: @@ -116,6 +172,16 @@ randomized, due to the ``extension permutation`` feature introduced in Chrome 11 As far as we know, most websites use an allowlist, not a blocklist to filter out bot traffic. So do not expect random ja3 fingerprints would work in the wild. +Moreover, do not generate random ja3 strings. There are certain limits for a valid ja3 string. +For example: + +* TLS 1.3 ciphers must be at the front. +* GREASE extension must be the first. +* etc. + +You should copy ja3 strings from sniffing tools, not generate them, unless you can make +sure all the requirements are met. + Can I change JavaScript fingerprints with this library? ------ diff --git a/docs/index.rst b/docs/index.rst index d127e8a..e9b0abd 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -12,6 +12,7 @@ Welcome to curl_cffi's documentation! install impersonate + cookies advanced vs-requests faq diff --git a/docs/install.rst b/docs/install.rst index 00d46ca..15b58dd 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -1,6 +1,11 @@ Install ======= +The fingerprints in 0.6 on Windows are all wrong, you should update to 0.7 if you are on +Windows. + +Only Python 3.8 and above are supported. Python 3.7 has reached its end of life. + Via pip ------