Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add join() function #82

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add join() function #82

wants to merge 3 commits into from

Conversation

zilder
Copy link
Contributor

@zilder zilder commented Feb 13, 2019

Rebased and a bit refactored version of #54. Couple of fixes include:

  • fix allocation size for pairs array. It used to allocate more memory than it actually needed;
  • remove redundant CreateTupleDescCopy() call;
  • original function join() from previous PR was removed completely, and mjoin() renamed to join() as it is more universal.

Also this PR bumps version, so if we want to incorporate more changes to the SQL part of the extension, we can merge them together before merging to the master.

@codecov-io
Copy link

codecov-io commented Feb 13, 2019

Codecov Report

Merging #82 into master will increase coverage by 0.06%.
The diff coverage is 93.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #82      +/-   ##
==========================================
+ Coverage   91.59%   91.65%   +0.06%     
==========================================
  Files          10       10              
  Lines        1940     2002      +62     
==========================================
+ Hits         1777     1835      +58     
- Misses        163      167       +4
Impacted Files Coverage Δ
src/bigistore.c 95.44% <93.54%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e485486...2651759. Read the comment docs.

@alexeyklyukin
Copy link
Contributor

Please, merge in changes from the master. Can't compile on 12dev ( error: unknown type name 'FunctionCallInfoData').

@@ -46,6 +46,30 @@
query("SELECT * FROM each(NULL::#{type})").should match []
end

it 'should join multiple istores' do
query("SELECT * FROM join('0 => 1, 3 => 1, 9 => 1'::#{type}, '2 => 2, 3 => 2, 8 => 2, 10=>2'::#{type})t(key int, a bigint, b bigint);").should match \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comma at the end of query gets doubled by dumbo.

* executor will treat the function result as an empty set.
*/
static void
prepTuplestoreResult(FunctionCallInfo fcinfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like the code "borrowed" from dblink.c, perhaps we should credit them for it in the file header.
However, I think we can get rid of this function and just inline it (as it is used only once).

input = PG_GETARG_ARRAYTYPE_P(0);
i_eltype = ARR_ELEMTYPE(input);
get_typlenbyvalalign(i_eltype, &i_typlen, &i_typbyval, &i_typalign);
deconstruct_array(input, i_eltype, i_typlen, i_typbyval, i_typalign,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass NULL instead of i_nulls; the functions is declared variadic, but the elements of the input array may still be null, which we likely need to prevent.

break;

dvalues[0] = Int32GetDatum(current_key);
tuple = heap_form_tuple(tupdesc, dvalues, nulls);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... so we just toss a row of N + 1 64bit integers and hope PostgreSQL will fit it into whatever type description the caller provides. That doesn't produce meaningful results unless the caller somehow guessed it needs to provide a description of N+1 integers, and actually crashes when the caller is creative and provides, say, an array or other varlena as one of the output types:

alexk@[local]/postgres # select * from join('1=>1, 2=>1', '1=>3', '3=>4') as t(a text);
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Instead of doing this, we should declare the function in SQL as returns table (key bigint, values bigint[]) and rely on this structure inside the function, calling construct_array to produce values argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that a crash is not nice but what we want is a result of n+1 columns
maybe we can create separate function for the common case joining upto 10 istores and leave
the array option for if one need to join more than 10 istores.
Or can we check that that the type description makes sense and bail out with a descriptive error if not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can check the type description to have N + 1 bigint columns and emit an error otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the question is why do you need a multi-column result? it's also trivial to convert an array representation to the additional columns by selecting individual elements of the array, i.e select key, values[0], values[1] from (... .);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmh that's actually a good argument, let's do that then for simplicity

@@ -0,0 +1,4 @@
CREATE FUNCTION join(VARIADIC istores bigistore[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a cool name, but I'd go for something less common, i.e. istore_join.

Copy link
Contributor

@alexeyklyukin alexeyklyukin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix the cases when it crashes, change the name and fix the output type to something like (bigint, bigint[]).

pairs[i]++;
}
/* if current_key is to big, all already set non null values must be reset */
else if(current_key > pairs[i]->key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we compare for greater than, but not less than (the equal case is covered above). I assume this code produces the join result, but it seems complicated. Perhaps a comment would help...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have equal, greater than and else so imho all cases are covered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but I couldn't figure out why do we specifically compare for greater then and leave the else for the less then. A comment before the outer for loop would definitely help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true let me think of a good comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants