-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid HTTP request in some cases on replaceDocuments #207
Comments
The issue stems from the logic when we accumulate headers continuously and reuse curl connections set up previously. This means that all headers from the old connection merge into the new one. If we made an
To fix this, we need to ensure that on long-running scripts, we don't infinitely extend headers with the same type. We should consider using a key-value map instead. |
I have discovered that while dumping curl info with a curl raw request by adding curl_setopt($conn, CURLINFO_HEADER_OUT, true);
...
var_dump(curl_getopt($conn)); I discovered that we send proper headers, while still getting a 400 error from the daemon, which is weird: {
"url": "http://manticore:9308/bulk",
"content_type": "application/json; charset=UTF-8",
"http_code": 400,
"header_size": 241,
"request_size": 150,
"filetime": -1,
"ssl_verify_result": 0,
"redirect_count": 0,
"total_time": 0.063445,
"namelookup_time": 0.000052,
"connect_time": 0.000052,
"pretransfer_time": 0.000067,
"size_upload": 729799,
"size_download": 31,
"speed_download": 492,
"speed_upload": 11584111,
"download_content_length": 31,
"upload_content_length": 729799,
"starttransfer_time": 0.000068,
"redirect_time": 0,
"redirect_url": "",
"primary_ip": "192.168.128.2",
"certinfo": [],
"primary_port": 9308,
"local_ip": "",
"local_port": -1,
"http_version": 2,
"protocol": 1,
"ssl_verifyresult": 0,
"scheme": "HTTP",
"appconnect_time_us": 0,
"connect_time_us": 52,
"namelookup_time_us": 52,
"pretransfer_time_us": 67,
"redirect_time_us": 0,
"starttransfer_time_us": 68,
"total_time_us": 63445,
"request_header": "POST /bulk HTTP/1.1\r\nHost: manticore:9308\r\nAccept: */*\r\nAccept-Encoding: deflate, gzip\r\nContent-Type: application/x-ndjson\r\nContent-Length: 729799\r\n\r\n",
"effective_method": "POST"
} I tried to write a script that reproduces it with dumped queries into a file: #!/usr/bin/env bash
endpoint="http://manticore:9308"
requests_prefix="/tmp/data.request."
# Function to send a request and print the result
send_request() {
local url=${endpoint}/bulk
curl --silent --show-error --write-out "%{http_code}" \
--data-binary @"$1" -H 'Content-type: application/x-ndjson' \
--output /dev/null "${url}"
}
concurrency=$(ls "$requests_prefix"* | wc -l | xargs)
# Loop indefinitely
while true; do
# Array to store PIDs of curl processes
pids=()
# Check if we've reached the maximum number of parallel requests
while [ ${#pids[@]} -ge ${concurrency} ]; do
# Wait for any child process to exit
wait -n
# Remove exited child process from the array
pids=("${pids[@]}")
done
# Send the request in detached mode and store the PID
for f in "$requests_prefix"*; do
send_request "${f}" &
pids+=($!)
done
# Wait for all child processes to exit
for pid in "${pids[@]}"; do
wait "${pid}"
done
done But unfortunately, it does not fail at all. |
After multiple test runs, it seems like the following line helps stabilize the issue: curl_setopt($conn, CURLOPT_FRESH_CONNECT, true); |
As we discussed, we should implement PHP code that will attempt to replicate it. |
When running the following script, in some cases, EVEN when the request data is valid, the daemon returns an invalid HTTP request error. It happens during concurrency. Sometimes, there is one more error that appears – No clue about the reason, but this is what we should investigate.
The table is simple: create table test(title string); And create a `run` script with `chmod +x run` after and put the contents with the changed host to manticore:#!/usr/bin/env php
<?php
$host = 'http://manticore:9308';
$bulkRequest = '{"replace":{"index":"test","id":1,"doc":{"title":"Hello World"}}}';
$concurrency = 24;
while (true) {
$pids = [];
// Firs send the request
$endpoint = "$host/sql";
$body = "query=SELECT * FROM test LIMIT 20";
$contentType = "application/json";
sendRequest($endpoint, $body, $contentType);
$contentType = 'application/x-ndjson';
for ($i = 0; $i < $concurrency; $i++) {
$pid = pcntl_fork();
if ($pid === 0) {
usleep(random_int(500000, 1000000));
$endpoint = "$host/bulk";
$exit = sendRequest($endpoint, $bulkRequest, $contentType);
exit($exit);
} else {
// Parent process
$pids[] = $pid;
}
}
foreach ($pids as $pid) {
pcntl_waitpid($pid, $status, WNOHANG);
}
}
function sendRequest($endpoint, $body, $contentType) {
static $ch;
if (!isset($ch)) {
$ch = curl_init();
}
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_URL, $endpoint);
curl_setopt($ch, CURLOPT_POSTFIELDS, $body);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, ["Content-Type: $contentType"]);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_exec($ch);
$responseCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$error = curl_errno($ch);
//curl_close($ch);
echo "$responseCode - "
. ($responseCode === 200 ? 'OK' : 'ERR: ' . json_encode(curl_getinfo($ch)))
. "\n";
flush();
if ($error > 0) {
$errorText = curl_error($ch);
echo "CURL error occured: $errorText\n";
curl_close($ch);
$ch = null;
}
return $responseCode === 200 ? 0 : 1;
} Once we run it with Example of usage and results: $ ./run | grep '^400'
400 - ERR: {"url":"http:\/\/manticore:9308\/bulk","content_type":"application\/json; charset=UTF-8","http_code":400,"header_size":241,"request_size":180,"filetime":-1,"ssl_verify_result":0,"redirect_count":0,"total_time":0.00083,"namelookup_time":8.0e-5,"connect_time":8.0e-5,"pretransfer_time":0.000121,"size_upload":66,"size_download":25,"speed_download":25000,"speed_upload":66000,"download_content_length":25,"upload_content_length":66,"starttransfer_time":0.000804,"redirect_time":0,"redirect_url":"","primary_ip":"192.168.128.3","certinfo":[],"primary_port":9308,"local_ip":"","local_port":-1,"http_version":2,"protocol":1,"ssl_verifyresult":0,"scheme":"HTTP","appconnect_time_us":0,"connect_time_us":80,"namelookup_time_us":80,"pretransfer_time_us":121,"redirect_time_us":0,"starttransfer_time_us":804,"total_time_us":830,"request_header":"POST \/bulk HTTP\/1.1\r\nHost: manticore:9308\r\nAccept: *\/*\r\nContent-Type: application\/x-ndjson\r\nContent-Length: 66\r\n\r\n","effective_method":"POST"} |
seems error in the script if I remove
I not sure how |
found the issue report about the curl and php |
I think it's a good idea to get rid of the static cURL instance and instead keep it per object instance: |
@donhardman Do we really need to get rid of that? I mean those persistent connections were declared as one of the client's 'features'. If we don't want them to be used for a specific application, we can just disable them through the client's config. |
We should keep it, but in a proper way – use a non-static variable that sticks to the class. That way, when we don't create multiple instances of the Client, it will still work as expected. However, when we do create instances, we're probably doing it on purpose, so we expect not to keep the original state from some random instance used before and not in the current namespace (like a fork). |
The case here – the code with forking is implemented correctly. I did not use a client created from the parent process, and instead created a new Client instance inside the forked process. However, since we have a static variable, we are using it from the parent process, which causes side effects. To resolve this issue properly, we should keep the same approach but use an instance variable instead of a static variable. |
The related PR is https://github.com/manticoresoftware/manticoresearch-php/pull/209/files @donhardman pls review |
Looks good! I've double-checked everything, and it all looks great. We can close this issue once the pull request is merged. 👍 |
Done in #209 |
As we were updating the docs with recalculated embeddings in the Manticore GitHub search demo, we encountered an issue where
replaceDocuments
was producing an "Invalid HTTP method" error. We should investigate this and try to understand the reason behind it.The text was updated successfully, but these errors were encountered: