Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO Error: Could not set lock on file "/var/lib/postgresql/data/duckdb_cache/.lock"... #517

Open
XiangyuFan17 opened this issue Jan 3, 2025 · 2 comments

Comments

@XiangyuFan17
Copy link

Description

Error 1:
Executor Error: DuckDB re-planning failed: IO Error: Could not set lock on file "/var/lib/postgresql/data/duckdb_cache/.lock": Conflicting lock is held in /usr/lib/postgresql/16/bin/postgres (PID 14334) by user root. See also https://duckdb.org/docs/connect/concurrency

Error 2:
ERROR: (PGDuckDB/CreatePlan) Prepared query returned an error: 'IO Error: Could not set lock on file "/var/lib/postgresql/data/duckdb_cache/.lock": Conflicting lock is held in /usr/lib/postgresql/16/bin/postgres (PID 14283) by user root. See also https://duckdb.org/docs/connect/concurrency

Hey Guys, I am currently using the pg_duckdb extension as a data transfer tool to read data from data warehouse to PostgreSQL. Everything works fine until I try to run multiple

'CREATE TABLE tableA as SELECT * FROM read_parquet('$path') AS ()'

statements concurrently, error occurred as Error 1 & Error 2
So I checked the pg_stat_activity and found that PID 14334 was running another 'CREATE TABLE' SQL statement.

My question is: Does pg_duckdb support multiple concurrent write operations in PostgreSQL?

@XiangyuFan17 XiangyuFan17 changed the title [bug/feat]: IO Error: Could not set lock on file "/var/lib/postgresql/data/duckdb_cache/.lock"... IO Error: Could not set lock on file "/var/lib/postgresql/data/duckdb_cache/.lock"... Jan 3, 2025
@JelteF
Copy link
Collaborator

JelteF commented Jan 3, 2025

That seems like a bug in our http caching logic. @mkaruza because you worked a lot on that.

Could you share some more details on when this happens?

  1. Could you share the actual commands that you run?
  2. Is the filename that you're using in read_parquet the same for both of the concurrent queries? Or are they different files.

@XiangyuFan17
Copy link
Author

XiangyuFan17 commented Jan 6, 2025

@JelteF Sorry for the late response.

  1. here are some actual commands
CREATE TABLE public.tableA
AS
SELECT * FROM
read_parquet('s3://$bucketA/Database/dfs/cicc_esg_indicator_detail/*.parquet')
AS (
   "globalid" TEXT,
   "company_id" TEXT,
   "report_year" INT,
   "report_name" TEXT,
   "indicator_code" TEXT
);
CREATE TABLE public.tableB
AS
SELECT * FROM read_parquet('s3://$bucketA/Database/dds/company/*.parquet')
AS (
   "company_id" TEXT,
   "company_name" TEXT,
   "social_credit_code" TEXT,
   "list_flag" BOOLEAN,
   "u_id" TEXT,
   "u_storetime" TIMESTAMP,
   "u_sourcetime" TIMESTAMP,
   "u_invalid" INT,
   "u_updatetime" TIMESTAMP
);

there were few more SQLs, but all of them are exactly in the same format.

  1. No, they are not the same. Each SQL statement has a unique AWS S3 storage path and a unique table name in PostgreSQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants