Space filling curves allow us to represent an n-dimensional curve
in one dimensional while preserving locality. Techniques such as
z-ordering
allow big data platforms to efficiently store and
process large chunks of data.
- Processing Petabytes of Data in Seconds with Databricks Delta
- Z-order curve
- Z-order indexing for multifaceted queries in Amazon DynamoDB: Part 1
- Z-order indexing for multifaceted queries in Amazon DynamoDB: Part 2
Spark-2.3.1 on Scala 2.11.12
Spark-2.4.7 on Scala 2.11.12 and Scala 2.12.13
Spark-3.1.0 on Scala 2.12.13 Java 11
Given the dataframe below, we want to Morton (Z Order) our data by id
, x
, y
// Currently, this isn't setup to use Maven.
// For now, publish local or just assembly and use the jar.
val orderingCols: Array[String] = Array("id", "x", "y")
val df: DataFrame = Seq(
(1, 1, 12.23, "a", "m"),
(4, 9, 5.05, "b", "m"),
(3, 0, 1.23, "c", "f"),
(2, 2, 100.4, "d", "f"),
(1, 25, 3.25, "a", "m")
).toDF("x", "y", "amnt", "id", "sex")
val mortonOrdering: Morton = new Morton(df, orderingCols)
// this will order your whole dataframe by the z_index
val zIndexedDF: DataFrame = mortonOrdering
.mortonIndex.sort("z_index")
- README
- Better organization
- Add other space filling curves: Hilbert, GeoHash, Peano
Looking for help with those experienced with creating decent READMEs and publishing code to Maven.