You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider the following USQL (reflecting the issue we've observed in our production development):
@data =
SELECT
*
FROM (VALUES
("A",(decimal?)2, "LabelA","A:1"),
("A",(decimal?)null,"LabelA","A:2"),
("A",(decimal?)1, "LabelA","A:3"),
("B",(decimal?)4, "LabelB","B:1")) AS T(Name, Value, Type, Id);
@result =
SELECT
Name,
SUM(Value) AS Sum,
Type,
AGG<AggTest.genericAggregator>(Name, string.Empty) AS RowId
FROM @data
GROUP BY Name, Type;
OUTPUT @result TO "/res.csv" USING Outputters.Csv(outputHeader:true);
The aggregator in this case is the the sample custom aggregator from the USQL reference doc (our production code is different, but the problem is demonstrable with the sample UDAGG code):
using Microsoft.Analytics.Interfaces;
namespace AggTest
{
public class genericAggregator : IAggregate<string, string, string>
{
string AggregatedValue;
public override void Init()
{
AggregatedValue = "";
}
public override void Accumulate(string ValueToAgg, string GroupByValue)
{
AggregatedValue += ValueToAgg + ",";
}
public override string Terminate()
{
// remove last comma
return AggregatedValue.Substring(0, AggregatedValue.Length - 1);
}
}
}
When executed, either within an ADLA instance in Azure or using the USQL local run environment within Visual Studio 2017, the result is:
The built-in USQL SUM aggregator has returned NULL rather than the expected output of 3 for the row with Name A. Removing the call to the user-defined aggregator returns a rowset with the expected SUM aggregation value of 3:
"Name","Sum","Type"
"A",3,"LabelA"
"B",4,"LabelB"
This is clearly inconsistent behaviour for the SUM aggregate which shouldn't care if a UDAGG is included in the processing of the same group.
Interestingly, if the @Result query is modified to:
@result =
SELECT
Name,
AVG(Value) AS Avg,
SUM(Value) AS Sum,
Type,
AGG<AggTest.genericAggregator>(Name, string.Empty) AS RowId
FROM @data
GROUP BY Name, Type;
In this case the introduction of the AVG aggregator both produces the expected average as well as coaxing the SUM aggregator into also producing the correct answer!
At present we've implemented two workarounds:
Use the null coalescing operator within the SUM, i.e.:
SUM(Value ?? 0.0m) AS Sum
Filter the rowset to ensure that NULLs in the field to sum aren't included in the group (this has the disadvantage that if more than one SUM aggregator is used in a single query and each of the fields may have NULLs then this adds complication).
Obviously neither work around is ideal as we may care that the result of a SUM aggregate is actually NULL if there were no non-NULL values to sum in the group.
The text was updated successfully, but these errors were encountered:
Consider the following USQL (reflecting the issue we've observed in our production development):
The aggregator in this case is the the sample custom aggregator from the USQL reference doc (our production code is different, but the problem is demonstrable with the sample UDAGG code):
When executed, either within an ADLA instance in Azure or using the USQL local run environment within Visual Studio 2017, the result is:
The built-in USQL SUM aggregator has returned NULL rather than the expected output of 3 for the row with Name A. Removing the call to the user-defined aggregator returns a rowset with the expected SUM aggregation value of 3:
This is clearly inconsistent behaviour for the SUM aggregate which shouldn't care if a UDAGG is included in the processing of the same group.
Interestingly, if the @Result query is modified to:
Then the output is:
In this case the introduction of the AVG aggregator both produces the expected average as well as coaxing the SUM aggregator into also producing the correct answer!
At present we've implemented two workarounds:
Obviously neither work around is ideal as we may care that the result of a SUM aggregate is actually NULL if there were no non-NULL values to sum in the group.
The text was updated successfully, but these errors were encountered: