Skip to content

dwsmith1983/space-filling-curves

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Space Filling Curve CI

Space Filling Curves

Space filling curves allow us to represent an n-dimensional curve in one dimensional while preserving locality. Techniques such as z-ordering allow big data platforms to efficiently store and process large chunks of data.

  1. Processing Petabytes of Data in Seconds with Databricks Delta
  2. Z-order curve
  3. Z-order indexing for multifaceted queries in Amazon DynamoDB: Part 1
  4. Z-order indexing for multifaceted queries in Amazon DynamoDB: Part 2

Available GitHub Packages

Spark-2.3.1 on Scala 2.11.12 
Spark-2.4.7 on Scala 2.11.12 and Scala 2.12.13
Spark-3.1.0 on Scala 2.12.13 Java 11

Usage

Given the dataframe below, we want to Morton (Z Order) our data by id, x, y

// Currently, this isn't setup to use Maven. 
// For now, publish local or just assembly and use the jar.
val orderingCols: Array[String] = Array("id", "x", "y")
val df: DataFrame = Seq(
  (1, 1, 12.23, "a", "m"),
  (4, 9, 5.05, "b", "m"),
  (3, 0, 1.23, "c", "f"),
  (2, 2, 100.4, "d", "f"),
  (1, 25, 3.25, "a", "m")
).toDF("x", "y", "amnt", "id", "sex")

val mortonOrdering: Morton = new Morton(df, orderingCols)
// this will order your whole dataframe by the z_index
val zIndexedDF: DataFrame = mortonOrdering
  .mortonIndex.sort("z_index")

Work in Progress

  • README
  • Better organization
  • Add other space filling curves: Hilbert, GeoHash, Peano

Help Needed

Looking for help with those experienced with creating decent READMEs and publishing code to Maven.