You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two issues with how XTable parses the commit log of source Delta tables that have the deletion vectors property set.
Missing tightBounds Property: For Delta tables with deletion vectors, the file stats include an additional property called tightBounds. This property is missing in XTable's representation of the Delta stats. As a result, parsing commit logs fails.
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "tightBounds" (class org.apache.xtable.delta.DeltaStatsExtractor$DeltaStats), not marked as ignorable (4 known properties: "nullCount", "numRecords", "maxValues", "minValues"])
Incorrect Handling of Delete Vectors: When a delete vector is added for a data file in Delta Lake, the commit log contains both a remove and an add entry for the same data file. This is done to link deletion vector file to the data file. However, XTable incorrectly adds the data file path to both the new and removed file sets in FileDiff. XTable should ignore this since no new data file is generated. Instead, once representation of deletion vectors is added, it should report the addition of a deletion vector. For e.g.
Feature Request / Improvement
There are two issues with how XTable parses the commit log of source Delta tables that have the deletion vectors property set.
tightBounds
Property: For Delta tables with deletion vectors, the file stats include an additional property calledtightBounds
. This property is missing in XTable's representation of the Delta stats. As a result, parsing commit logs fails.remove
and anadd
entry for the same data file. This is done to link deletion vector file to the data file. However, XTable incorrectly adds the data file path to both the new and removed file sets inFileDiff
. XTable should ignore this since no new data file is generated. Instead, once representation of deletion vectors is added, it should report the addition of a deletion vector. For e.g.Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: