-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add join() function #82
base: master
Are you sure you want to change the base?
Conversation
Author: Manuel Kniep <[email protected]>
…e memory for pairs array allocated than actually needed)
Codecov Report
@@ Coverage Diff @@
## master #82 +/- ##
==========================================
+ Coverage 91.59% 91.65% +0.06%
==========================================
Files 10 10
Lines 1940 2002 +62
==========================================
+ Hits 1777 1835 +58
- Misses 163 167 +4
Continue to review full report at Codecov.
|
Please, merge in changes from the master. Can't compile on 12dev ( error: unknown type name 'FunctionCallInfoData'). |
@@ -46,6 +46,30 @@ | |||
query("SELECT * FROM each(NULL::#{type})").should match [] | |||
end | |||
|
|||
it 'should join multiple istores' do | |||
query("SELECT * FROM join('0 => 1, 3 => 1, 9 => 1'::#{type}, '2 => 2, 3 => 2, 8 => 2, 10=>2'::#{type})t(key int, a bigint, b bigint);").should match \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comma at the end of query gets doubled by dumbo.
* executor will treat the function result as an empty set. | ||
*/ | ||
static void | ||
prepTuplestoreResult(FunctionCallInfo fcinfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like the code "borrowed" from dblink.c, perhaps we should credit them for it in the file header.
However, I think we can get rid of this function and just inline it (as it is used only once).
input = PG_GETARG_ARRAYTYPE_P(0); | ||
i_eltype = ARR_ELEMTYPE(input); | ||
get_typlenbyvalalign(i_eltype, &i_typlen, &i_typbyval, &i_typalign); | ||
deconstruct_array(input, i_eltype, i_typlen, i_typbyval, i_typalign, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should pass NULL instead of i_nulls; the functions is declared variadic, but the elements of the input array may still be null, which we likely need to prevent.
break; | ||
|
||
dvalues[0] = Int32GetDatum(current_key); | ||
tuple = heap_form_tuple(tupdesc, dvalues, nulls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... so we just toss a row of N + 1 64bit integers and hope PostgreSQL will fit it into whatever type description the caller provides. That doesn't produce meaningful results unless the caller somehow guessed it needs to provide a description of N+1 integers, and actually crashes when the caller is creative and provides, say, an array or other varlena as one of the output types:
alexk@[local]/postgres # select * from join('1=>1, 2=>1', '1=>3', '3=>4') as t(a text);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Instead of doing this, we should declare the function in SQL as returns table (key bigint, values bigint[])
and rely on this structure inside the function, calling construct_array to produce values argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree that a crash is not nice but what we want is a result of n+1 columns
maybe we can create separate function for the common case joining upto 10 istores and leave
the array option for if one need to join more than 10 istores.
Or can we check that that the type description makes sense and bail out with a descriptive error if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can check the type description to have N + 1 bigint columns and emit an error otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the question is why do you need a multi-column result? it's also trivial to convert an array representation to the additional columns by selecting individual elements of the array, i.e select key, values[0], values[1] from (... .);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmh that's actually a good argument, let's do that then for simplicity
@@ -0,0 +1,4 @@ | |||
CREATE FUNCTION join(VARIADIC istores bigistore[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a cool name, but I'd go for something less common, i.e. istore_join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fix the cases when it crashes, change the name and fix the output type to something like (bigint, bigint[]).
pairs[i]++; | ||
} | ||
/* if current_key is to big, all already set non null values must be reset */ | ||
else if(current_key > pairs[i]->key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we compare for greater than, but not less than (the equal case is covered above). I assume this code produces the join result, but it seems complicated. Perhaps a comment would help...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have equal, greater than and else so imho all cases are covered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, but I couldn't figure out why do we specifically compare for greater then and leave the else for the less then. A comment before the outer for loop would definitely help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true let me think of a good comment
Rebased and a bit refactored version of #54. Couple of fixes include:
pairs
array. It used to allocate more memory than it actually needed;CreateTupleDescCopy()
call;join()
from previous PR was removed completely, andmjoin()
renamed tojoin()
as it is more universal.Also this PR bumps version, so if we want to incorporate more changes to the SQL part of the extension, we can merge them together before merging to the master.