odd join behavior #572

sggraham32 · 2012-12-04T21:49:21Z

Set up: include the attached file in a webapp subdirectory called testfeed that was included in the ql.io discuss forum in a post by me with the same subject line.

I initially wrote a simple join between the atom:feed and the mapping info, using a script variable to contain the map:

mapping = [
{
"name" : "A",
"user" : "engineering"
},
{
"name" : "B",
"user" : "pre-sales"
}
];

create table test.feed on select get from "http://localhost:8080/testfeed/test.xml" resultset "atom:feed.atom:entry";

select test.atom:content.vsc:Thing.vsc:displayName as t:displayName,
umap.name as umap_name,
test.atom:content.vsc:Thing.inst:p1 as t:p1,
umap.user as user

from test.feed as test, mapping as umap
where umap.name = test.atom:content.vsc:Thing.vsc:displayName;

The response is what I expected:
[
{
"t:displayName": "A",
"umap_name": "A",
"t:p1": "A property1",
"user": "engineering"
},
{
"t:displayName": "B",
"umap_name": "B",
"t:p1": "B property1",
"user": "pre-sales"
}
]

When I converted the script to instead pull the mapping file in from an http call (to the same data contained in a file in the /testfeed webapp called user-map.json), I get different join results:
create table mapping on select get from "http://localhost:8080/testfeed/user-map.json" ;

The exact same select statement seems to create a union instead of a join:

[

{
    "t:displayName": "A",
    "umap_name": "A",
    "t:p1": "A property1",
    "user": "engineering"
},
{
    "t:displayName": "A",
    "umap_name": "B",
    "t:p1": "A property1",
    "user": "pre-sales"
},
{
    "t:displayName": "B",
    "umap_name": "A",
    "t:p1": "B property1",
    "user": "engineering"
},
{
    "t:displayName": "B",
    "umap_name": "B",
    "t:p1": "B property1",
    "user": "pre-sales"
},
{
    "t:displayName": "C",
    "umap_name": "A",
    "t:p1": "C property1",
    "user": "engineering"
},
{
    "t:displayName": "C",
    "umap_name": "B",
    "t:p1": "C property1",
    "user": "pre-sales"
}

]

The data is really the same in both situations, I would expect the result of the select statement to be the same too.

The text was updated successfully, but these errors were encountered:

mbellmbell · 2012-12-04T22:09:37Z

While you're waiting for a real answer, maybe try assigning each select statement to a variable and then doing the join between those variables. My (very limited) understanding is that the ql.io query optimizer will query sourceA once and then query sourceB once per row from sourceA. If sourceB returns all rows (which it does in your case), then you'll end up with a cross-product.

We had luck with:

A = select * from tableA
B = select * from tableB

select A.x, B.y from A, B where A.x=B.x

Plus it's way faster because there are only two network requests... the join (iterative subqueries) is all done in memory. Note that if you have > 50 rows in the first table it iterates over then you need to change maxNestedRequests to a larger number (that defines the number of subqueries executed).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

odd join behavior #572

odd join behavior #572

sggraham32 commented Dec 4, 2012

mbellmbell commented Dec 4, 2012

odd join behavior #572

odd join behavior #572

Comments

sggraham32 commented Dec 4, 2012

mbellmbell commented Dec 4, 2012