Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USQL SUM aggregator returns null when used with user-defined aggregators #51

Open
supernoodles opened this issue Jul 22, 2017 · 2 comments

Comments

@supernoodles
Copy link

supernoodles commented Jul 22, 2017

Consider the following USQL (reflecting the issue we've observed in our production development):

@data = 
    SELECT
        *
    FROM (VALUES
        ("A",(decimal?)2,   "LabelA","A:1"),
        ("A",(decimal?)null,"LabelA","A:2"),
        ("A",(decimal?)1,   "LabelA","A:3"),
        ("B",(decimal?)4,   "LabelB","B:1")) AS T(Name, Value, Type, Id);

@result = 
    SELECT 
        Name,
        SUM(Value) AS Sum,
        Type,
        AGG<AggTest.genericAggregator>(Name, string.Empty) AS RowId
    FROM @data
    GROUP BY Name, Type;

OUTPUT @result TO "/res.csv" USING Outputters.Csv(outputHeader:true);

The aggregator in this case is the the sample custom aggregator from the USQL reference doc (our production code is different, but the problem is demonstrable with the sample UDAGG code):

using Microsoft.Analytics.Interfaces;

namespace AggTest
{
    public class genericAggregator : IAggregate<string, string, string>
    {
        string AggregatedValue;

        public override void Init()
        {
            AggregatedValue = "";
        }

        public override void Accumulate(string ValueToAgg, string GroupByValue)
        {
            AggregatedValue += ValueToAgg + ",";
        }

        public override string Terminate()
        {
            // remove last comma
            return AggregatedValue.Substring(0, AggregatedValue.Length - 1);
        }
    }
}

When executed, either within an ADLA instance in Azure or using the USQL local run environment within Visual Studio 2017, the result is:

"Name","Sum","Type","RowId"
"A",,"LabelA","A,A,A"
"B",4,"LabelB","B"

The built-in USQL SUM aggregator has returned NULL rather than the expected output of 3 for the row with Name A. Removing the call to the user-defined aggregator returns a rowset with the expected SUM aggregation value of 3:

"Name","Sum","Type"
"A",3,"LabelA"
"B",4,"LabelB"

This is clearly inconsistent behaviour for the SUM aggregate which shouldn't care if a UDAGG is included in the processing of the same group.
Interestingly, if the @Result query is modified to:

@result = 
    SELECT 
        Name,
        AVG(Value) AS Avg,
        SUM(Value) AS Sum,
        Type,
        AGG<AggTest.genericAggregator>(Name, string.Empty) AS RowId
    FROM @data
    GROUP BY Name, Type;

Then the output is:

"Name","Avg","Sum","Type","RowId"
"A",1.5,3,"LabelA","A,A,A"
"B",4,4,"LabelB","B"

In this case the introduction of the AVG aggregator both produces the expected average as well as coaxing the SUM aggregator into also producing the correct answer!
At present we've implemented two workarounds:

  1. Use the null coalescing operator within the SUM, i.e.:
SUM(Value ?? 0.0m) AS Sum
  1. Filter the rowset to ensure that NULLs in the field to sum aren't included in the group (this has the disadvantage that if more than one SUM aggregator is used in a single query and each of the fields may have NULLs then this adds complication).

Obviously neither work around is ideal as we may care that the result of a SUM aggregate is actually NULL if there were no non-NULL values to sum in the group.

@supernoodles
Copy link
Author

No response in 10 days from anyone... :-(

Now raising as a support request from within Azure Portal.

@saveenr
Copy link

saveenr commented Aug 2, 2017

We apologies for the late response :-( But thank you for raising this. The engineering team is investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants