You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The REGEXP_REPLACE function is called the same way in ClickBench query 28 with Postgres, DuckDB, and ClickHouse. However, if we attempt the same call with SuperDB's REGEXP_REPLACE function it currently causes a parse error and significant adjustments need to be made to get it to return the same query result.
Strictly speaking, it appears REGEXP_REPLACE is not a formal part of the SQL spec. However, we've got a general goal to make SuperDB as Postgres-compatible as we can so users can see it as a viable drop-in replacement, such as for running their existing BI queries. I can see from the Postgres docs that their parameters cover even more ground than what's used by this query. Below I'll share some details about my experience as a user getting this one to work.
And here's the parse error from SuperDB if I attempt to execute the same query.
$ super -version
Version: v1.18.0-215-g1d783cc2
$ super -c "SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') FROM 'referer.csv' FORMAT csv"
parse error at line 1, column 50:
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') FROM 'referer.csv' FORMAT csv
=== ^ ===
One way I got it to work is by surrounding the regexp in /, but this required:
Adding a \ before each attempt to match a literal / inside the regexp
Changing the reference to the substring match of the the parenthesized subexpression from \1 to \$1
$ super -c "SELECT REGEXP_REPLACE(Referer, /^https?:\/\/(?:www\.)?([^\/]+)\/.*$/, '\$1') FROM 'referer.csv' FORMAT csv"
{regexp_replace:"go.mail"}
{regexp_replace:"tambov.irr.ru"}
{regexp_replace:"state=19945206"}
{regexp_replace:"circles"}
I also was able to get it working by keeping the regexp as a string, but this required adding a \\ escape at the location of the parse error in addition to still making the \1 to \$1 change.
$ super -c "SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\\\.)?([^/]+)/.*$', '\$1') FROM 'referer.csv' FORMAT csv"
{regexp_replace:"go.mail"}
{regexp_replace:"tambov.irr.ru"}
{regexp_replace:"state=19945206"}
{regexp_replace:"circles"}
I don't claim perfect knowledge as to the history of the use of / as a regexp delimeter, but Wikipedia notes its common use with sed, Perl, and I'm also familiar with it from JavaScript. Meanwhile, it seems like all the major SQL implementations I can spot show just single-quoted strings. Likewise, it seems like the \1 syntax is consistently favored for referencing a match of a parenthesized subexpression. Therefore I expect we'd want to adopt these conventions also/instead in the pursuit of SQL compatibility.
The text was updated successfully, but these errors were encountered:
tl;dr
The
REGEXP_REPLACE
function is called the same way in ClickBench query 28 with Postgres, DuckDB, and ClickHouse. However, if we attempt the same call with SuperDB'sREGEXP_REPLACE
function it currently causes a parse error and significant adjustments need to be made to get it to return the same query result.Strictly speaking, it appears
REGEXP_REPLACE
is not a formal part of the SQL spec. However, we've got a general goal to make SuperDB as Postgres-compatible as we can so users can see it as a viable drop-in replacement, such as for running their existing BI queries. I can see from the Postgres docs that their parameters cover even more ground than what's used by this query. Below I'll share some details about my experience as a user getting this one to work.Details
Repro is with super commit 1d783cc.
To simplify, I'll use the attached referer.csv test data:
Here's DuckDB returning the expected result using the same
REGEXP_REPLACE
call as in the ClickBench query 28.And here's the parse error from SuperDB if I attempt to execute the same query.
One way I got it to work is by surrounding the regexp in
/
, but this required:\
before each attempt to match a literal/
inside the regexp\1
to\$1
I also was able to get it working by keeping the regexp as a string, but this required adding a
\\
escape at the location of the parse error in addition to still making the\1
to\$1
change.I don't claim perfect knowledge as to the history of the use of
/
as a regexp delimeter, but Wikipedia notes its common use withsed
, Perl, and I'm also familiar with it from JavaScript. Meanwhile, it seems like all the major SQL implementations I can spot show just single-quoted strings. Likewise, it seems like the\1
syntax is consistently favored for referencing a match of a parenthesized subexpression. Therefore I expect we'd want to adopt these conventions also/instead in the pursuit of SQL compatibility.The text was updated successfully, but these errors were encountered: