CustomTransform RequiredColumns & AddedColumns are case sensitive #153

labbedaine · 2022-11-18T17:38:46Z

Hi.

I would like to know if there is a way to turn off case sensitivity on requiredColumns and addedColumns? Even if I have spark.sql.caseSensitive set to false my unit test is still failing.

sparkSession.conf.set("spark.sql.caseSensitive", false)

test("CustomTransform RequiredColumns & AddedColumns are case sensitive") {
    val lowercaseDF = spark.createDF(List(("Hello, world")), List(("lowercase", StringType, false)))

    lowercaseDF
      .trans(
        CustomTransform(
          requiredColumns = Seq("LOWERCASE"),
          transform = withTest(),
          addedColumns = Seq("test"),
        )
      )

    def withTest()(df: DataFrame): DataFrame = {
      df.withColumn("test", lit("A simple test."))
    }
  }

The [LOWERCASE] columns are not included in the DataFrame with the following columns [lowercase]
com.github.mrpowers.spark.daria.sql.MissingDataFrameColumnsException: The [LOWERCASE] columns are not included in the DataFrame with the following columns [lowercase]
at com.github.mrpowers.spark.daria.sql.DataFrameColumnsChecker.validatePresenceOfColumns(DataFrameColumnsChecker.scala:19)

Thank you!

The text was updated successfully, but these errors were encountered:

brayanjuls · 2022-12-05T19:38:33Z

Hi,

This seems to be a problem with how the library is validating the columns. I can go ahead and fix this problem by applying the following change if @MrPowers agrees with that.

I would change class com.github.mrpowers.spark.daria.sql.DataFrameColumnsChecker, from val missingColumns = requiredColNames.diff(df.columns.toSeq) to

    val givenColumns = df.columns.toSeq.map(_.toLowerCase)
    val requiredColumnsLower = requiredColNames.map(_.toLowerCase)
    requiredColumnsLower.diff(givenColumns)

That way the block of code keeps with time complexity O(n) and the problem is solved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CustomTransform RequiredColumns & AddedColumns are case sensitive #153

CustomTransform RequiredColumns & AddedColumns are case sensitive #153

labbedaine commented Nov 18, 2022 •

edited

Loading

brayanjuls commented Dec 5, 2022

CustomTransform RequiredColumns & AddedColumns are case sensitive #153

CustomTransform RequiredColumns & AddedColumns are case sensitive #153

Comments

labbedaine commented Nov 18, 2022 • edited Loading

brayanjuls commented Dec 5, 2022

labbedaine commented Nov 18, 2022 •

edited

Loading