-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement - Make the number of Columns in LoadColumnStatistics configurable #131
Comments
Hi Cory, this is a good point - however, as you can imagine, there are multiple steps involved:
Marco |
@marcosqlbi we can definitely add the new parameter to the public API. To avoid breaking changes for current users, we'll keep the default value at 50. This way, existing setups will continue to function as they do now. |
I know there is work involved and I know the best solution should be implemented. I have a few suggestions that may speed up the time of development to implement. While this may not seem like the best solution in the end, it would allow for immediate use any existing client tools. It would also allow me to run a full VertiPaq scan on my existing model sooner! The application could read from an appSetting from the app.Config. The logic could then use the configuration value instead of the hardcoded 50 and if it didn't exist in the app.config it could default to 50. For example, DaxStudio.exe.config an entry could be added such as .The same could be done in Tabular Editor 3. Later this could become more of a parameter that the client tools could pass in. In the end, this may rarely be needed, so exposing it in the UI of client tools would be fine, but most often not used. If it ends up there at some point great, but I would currently be happy with this behind the scenes configuration approach as a starting point. This suggestion is really an effort to make this a simple modification and expedite the time it takes. Unless I'm missing something, this may only be a few lines of code. I'm curious of others thoughts on this idea. |
@corycundy I am not sure that reading the config from a multi-platform library is possible, and even in that case, I do not that idea much! |
@corycundy I agree with Marco as implementing it in a multi-platform context could be complex and difficult to maintain. The VPAX library can't always know how the host application handles its configuration, and it might not always be through an app.config file. Furthermore, modifying a configuration file in the program's installation directory usually requires admin privileges, which can be difficult for many users. Ideally, this parameter should be available as advanced settings in the host application's UI, but exposing that parameter might take some time. Meanwhile, I was wondering if you could use our command-line tool Dax.Vpax.CLI, which we recently released. We can definitely add this parameter to the CLI and get it out soon. Would you be able to use this tool to extract your VPAX files temporarily until the new setting is available in other tools? |
@albertospelta could you let me know once the public VertiPaq Analyzer NuGet contains this parameter? Then we'll make sure to surface it in TE3's Preferences dialog. |
@otykier Sure thing! I’ll let you know as soon as the NuGet is available. Thanks! |
@corycundy The new parameter If you would like, you can use this setting with the command-line tool, which now supports it. To get started, install the CLI by running |
Hello @albertospelta , this change is great and super timely. Thank you so much for implementing this! I was able to call the CLI version with this new parameter. However, I unfortunately stumbled upon a few potential bugs. I was not able to get --column-batch-size to work with a value of 1, which is my ultimate goal. Even 2 causes me issues in my current Direct Lake model. We are hitting memory error and are working with Microsoft CAT and support on this. I got the following error from the command line. This was against an Adventure Works Internet Sales. Extracting VPAX metadata...Unhandled exception: Microsoft.AnalysisServices.AdomdClient.AdomdErrorResponseException: The end of the input was reached. I was able to get the following from DAX Studio. The end of the input was reached. Query Text: Looking at the code, I don't see how this can happen unless the set of validColumns is slightly different than the results of the .Where statement that looks at both IsRowNumber and GroupByColumns. I tested multiple models with a parameter of 1 and received this error. Secondly, on one Direct Lake model, I also received the following error which seems to be related. By looking at the code, I also don't understand this error either. Extracting VPAX metadata...Unhandled exception: Microsoft.AnalysisServices.AdomdClient.AdomdErrorResponseException: Query (1, 10) Too few arguments were passed to the UNION function. The minimum argument count for the function is 2. I was able to get the following from DAX Studio. Query Text: I couldn't reproduce this one on a second model. It somehow is adding the UNION with only column. I don't believe I have any Group By columns in any tables in the models I am testing with. I hope that the fix to the first one turns out fixing this one as well. Please let me know if you have any questions that I can help with regarding this. I would appreciate any additional help working through these issues. Cory |
Hi @corycundy,
Thanks |
@albertospelta, thanks for the quick response. I used 3 for the --column-batch-size when I received that error. Here is the command line with pathing and connection details modified. C:\Users\Cory.Cundy>vpax export "C:\Temp\ModelName.vpax" "Provider=MSOLAP;Data Source=powerbi://api.powerbi.com/v1.0/myorg/WorkspaceName;Initial Catalog=ModelName" --column-batch-size 3 As far as Group By Columns, it turns out that there was one single column in the model that did have a Group By Column assigned. Cory |
@albertospelta we saw a similar error as Cory when reducing the column batch size below 5. I'm guessing a check is needed somewhere, to ensure that UNION is only added when the column batch contains 2 or more columns. |
@otykier @corycundy The NuGet package version 1.7.1 and the CLI tool version 1.2.2 have just been released and include fix for the incorrect UNION syntax. Please let me know if this has resolved the issues, thanks! |
@albertospelta, my testing on both issues has been successful. I am no longer getting either error. One other observation that I made is that similar logic is used in the LoadTableStatistics method where 50 is hard coded. Thanks again for the quick turnaround time on this! |
@corycundy happy to let you know that the new Column Batch Size option is now available in Tabular Editor 3.17.0, using the latest VertiPaq Analyzer NuGet (1.7.1). |
I am closing this as the new parameter has been added to the settings in both Tabular Editor 3 and DAX Studio. We have created a dedicated issue here to track the suggestion for adding a new parameter for the number of tables as well. Thanks @corycundy |
Hello, I am running into a problem when trying to run VertiPaq Analyzer against a Direct Lake model. The model is much larger than the SKU size and our capacity is constantly running out of memory. This is being addressed from a model standpoint, however, in the mean time I wanted to get a full view of the columns. I ran the VertiPaq Analyzer both from DAX Studio and Tabular Editor 3. In DAX Studio I set Read statistics from data and in TE3 I used Read Statistics from data and set Statistics from Direct Lake models to Full.
The problem is that the there are many large columns that can easily fill up the memory. The VertiPaq Analyzer appears that it tries to get column statistics for 50 columns at a time. This simply doesn't work as running the EVALUATE UNION ( ROW ( .... over 50 large columns exceeds the SKU memory limit for a model. I strongly believe this would work fine if I was able to adjust the 50 down to a much lower number, maybe even 1. I understand the optimization to run one EVALUATE for 50 columns at a time, for my specific instance, I'd rather see it churn through 50 individual queries.
I can't imagine there is any way to configure this at the moment as 50 is hard coded into the code. My request would be that this could be configurable even if it is possible via a config file vs. a UI setting in DAX Studio or TE3.
Here is the code that I believe is causing 50 columns to be read. The profiler for some reason only had 49 in the DAX query.
Here is an example of the DAX query from the profiler.
The text was updated successfully, but these errors were encountered: